Welcome! Log In Create A New Profile

Advanced

VERY IMPORTANT FOR SPHIDER

Posted by bunker 
VERY IMPORTANT FOR SPHIDER
August 22, 2007 09:30PM
I am sure everyone here needs and would highly appreciate this:

When we provide a "must include" or "must not include" URL, the spider stops completely working on the url when the url is against one of these cases. However, there must be an option so that the spider DOESNT INDEX but CONTINUE TO FOLLOW links on that URL even if its against "must include" or "must not include" rules. Because there are links on the "must include" or "must not include" URLs that we want to index.

Okay here is the bottom line:

We set some include and dont include rules and choose to leave those URLs out of the index but we also want to be able to index the URLs that are listed on "include" and "dont include" rules.

Was I able to explain myself?

By the way: GREAT SCRIPT and GREAT COMMUNITY. Thanks everybody!
rec
Re: VERY IMPORTANT FOR SPHIDER
September 19, 2007 06:37PM
This is solved in version 1.3.3
Tec
Re: VERY IMPORTANT FOR SPHIDER
September 19, 2007 09:27PM
Stop. I hope this is not part of version 1.3.3 !!!
Keep on thinking about the consequences. Let us assume Tec has a homepage [www.tecsite.com]. But you don't like all the stuff from 'tec' and place 'tec' into your 'URL must not include'. Now you can prevent Sphider to index the main-page of that guy. But there are some links to subsites. If Sphider would be allowed to follow that links you would get all the trash of the indexed sub-sites. . .
Getting more worth if Sphider is allowed to leave the domain. Assume a fan of Tec placed a link to tecsite.com. Now you are running into a disaster area. Even as option something remais very dangerous as you lost control and don't know where Sphider will run during indexing.

So nobody is waiting for something or even would highly appreciate this.

p.s. If your children are using the result pages of your Sphider you should replace 'tec' with 'porno' and read my answer again. You've got me?

Tec
rec
Re: VERY IMPORTANT FOR SPHIDER
September 19, 2007 09:56PM
I respect Tec's ethics.
Re: VERY IMPORTANT FOR SPHIDER
October 29, 2007 04:14PM
Wouldn't an easy solution be to not complicate up sphider and just have it pull the follow/index directions from the links and meta tags and use the robots.txt, just like any other search engine.

I think the must follow and must not follow things are important, so you can list stuff that isn't linked to, or so you could tell sphider "hey, don't bother wasting your time with this whole directory." There are solutions in place to already have done what you want done, if sphider doesn't follow them it should, but I'm pretty sure it does.

You can have any combination of follow, no-follow, index, and no-index in the links and META tags.
Re: VERY IMPORTANT FOR SPHIDER
May 27, 2008 07:33PM
completely agree with bunker, very important, if your indexing someone elses site -- and have permission to do so -- and want to only include specific subjects the website owner won't stop Google, Yahoo, MSN etc from accessing pages as you don't want them indexed, i have v1.3.4, pages deny staright away and aren't followed, if i download 1.3.3 will i be able to follow unindexed urls?
Sorry, only registered users may post in this forum.

Click here to login