Welcome! Log In Create A New Profile


Resuming interrupted spidering

Posted by Hex Angel 
Resuming interrupted spidering
April 04, 2007 09:49PM
* Allows resuming paused spidering.

How does this work?

Every time an attempt to spider has been interrupted my only option has been to reindex the entire site. I notice a table for 'pending' in the database, which I assume is a list of links to be spidered, but it never contains any data.

I've been trying for two days to index a site of over 5,000 pages but have never passed 3,500 or so before the process died, leaving thousands of un-indexed pages.

Re: Resuming interrupted spidering
April 06, 2007 10:10PM
I had the same problem, what I found was that if you delete the site from the index and then add the site again, it will crawl the site, and if you stop it, and then go to the admin section, you will see the "Continue-index" option.

If this does not happen, let me know what version of sphider are you using.

Diego Medina
[url=http://www.fmpwizard.com]Web Developer[/url]
Re: Resuming interrupted spidering
April 07, 2007 05:19AM
Thanks for the reply, Diego.

Running 1.3.1f.

Deleting the site also deletes the current index, leaving the search function useless until a significant portion of the pages have been re-spidered. Not a very acceptable work-around I'm afraid, particularly given how long it takes to spider.

Moreover, I am sure I noticed the "Continue-index" option appearing after the *first* spider failed to complete. The subsequent attempt to Continue-index also failed to complete and "Continue-index" was not offered again or any time since. So unless sphider can index the entire site in two attempts it will never create a full index.

I'm coming to the conclusion that sphider, while very simple to install, configure and customise, is just a bit too lightweight and resource thirsty for my needs (5,00+ pages I'd want to re-index at least a couple times a week) and my server environment (which eventually kills the process even with page delays set as high as 10 seconds). If it actually iteratively resumed the last interrupted spider until full completion, I could live with that. I'd automate a nightly index that would, overtime, re-index the entire site over a couple or three (or four) nights.

Oh, well, off to see if I can sucessfully install swish-e.



Edited 1 time(s). Last edit at 04/07/2007 05:23AM by Hex Angel.
Re: Resuming interrupted spidering
April 21, 2007 05:59AM
Why would you want to re-index such a large amoung of pages several times per week? Most likely only a small fraction of these pages will have any changes worth being re-indexed.
I have a websites with 40.000+ pages and it works great by segmenting the site in several parts. That is: Pages I know being static and will not change (e.g. old message board entries) are defined as seperate "sites" within Sphider. I only re-index segments where there are frequent changes, which makes it relatively fast.
Anonymous User
Re: Resuming interrupted spidering
April 23, 2007 09:18PM
This is the sort of posts I like a lot
Sorry, only registered users may post in this forum.

Click here to login