Welcome! Log In Create A New Profile

Advanced

20 second delay between steps in indexing

Posted by Jack Harich 
20 second delay between steps in indexing
April 08, 2007 01:28PM
Sphider is a great tool. However, all of a sudden when I re-index, there is now a 20 second delay between steps. For example:

1. Retrieving: (first url) at 08:17:34.
2. Retrieving: (second url) at 08:17:54.
3. Retrieving: (third url) at 08:18:14.

I haven?t changed a thing (?) on the server. It?s a shared server with Real Web Host, so I wonder if that could be the problem. Or is it a cpanel problem? I?ve been unable to figure this out on my own.

Thanks,

Jack



Edited 1 time(s). Last edit at 04/08/2007 03:40PM by Jack Harich.
Anonymous User
Re: 20 second delay between steps in indexing
April 08, 2007 09:51PM
You have different url's
Re: 20 second delay between steps in indexing
April 10, 2007 07:18PM
Thanks. Here is an exact copy to make the problem more clear:

Spidering http: //www.thwink.org/
1. Retrieving: http: //www.thwink.org/ at 14:12:09.
Size of page: 20.81kb. Starting indexing at 14:12:29. MD5 sum checked. Page content not changed
Links found: 0. New links: 0
2. Retrieving: http: //www.thwink.org/sustain/articles/005/DuelingLoops_Book.htm at 14:12:29.
Size of page: 16.72kb. Starting indexing at 14:12:49. MD5 sum checked. Page content not changed
Links found: 0. New links: 0
3. Retrieving: http: //www.thwink.org/sustain/articles/005/DuelingLoops_Paper.htm at 14:12:49.
Size of page: 17.86kb. Starting indexing at 14:13:09. MD5 sum checked. Page content not changed
Links found: 0. New links: 0
4. Retrieving: http: //www.thwink.org/sustain/articles/008/LearningFromPastSocieties.htm at 14:13:09.
Size of page: 14.52kb. Starting indexing at 14:13:29. MD5 sum checked. Page content not changed
Links found: 0. New links: 0

Previously Sphider zoomed along, with this exact same reindexing.
Anonymous User
Re: 20 second delay between steps in indexing
April 20, 2007 09:14PM
Reindex with just 3 levels
Re: 20 second delay between steps in indexing
April 30, 2007 05:53PM
The Reindex with 3 levels is not the actual problem! The problem is server load and processor availabilities on shared hosts!

Just to make you an example! I run the spider on a dedicated machine! as long as 1 spider is running i get really good times <3 secs per page... when i start a second spider time increases to >7 secs.
Processor load almost doubles!

Actually this is a php problem when you run it (as you probably do) over the http interface!

May be you try doing your indexing over the command line (ssh) if available!

All the best,

Andreas
Sorry, only registered users may post in this forum.

Click here to login