Welcome! Log In Create A New Profile


How Sphider Uses Resources

Posted by selym 
How Sphider Uses Resources
September 13, 2011 03:55AM
I have a Sphider search engine with about 200 indexed sites and a database of about 1.2gb on shared hosting. My site will no longer index new sites as it fails right away on memory errors. My host lifted the Rlimit setting to troubleshoot and it worked without that limit set.

I always planned on upgrading to a vps at some point, and it may be time, but I'd like to understand why this is happening so I'm better informed in selecting my next hosting plan.

What I would like to know is...how does the size of the sphider database affect indexing a new website? I had no problems indexing the 200 other sites until one day, as the database grew, it would no longer work. Does Sphider use more processor/memory to index an external site the bigger the Sphider database is?

Re: How Sphider Uses Resources
September 13, 2011 10:51AM
<<< Does Sphider use more processor/memory to index an external site the bigger the Sphider database is? >>>

More processor load: yes. More memory: only marginal.
When indexing a new site (and all pages of that site), the script needs to compare whether each word, found as part of the page content, is a new keyword, or already stored in the database. Consequently it takes some time to check the complete database, if huge amount of keywords are already stored in db.
This is the reason why indexing a site as first URL will be much faster than indexing the same URL when 200 sites had already been indexed and all relevant data are stored in database.
Back to the index procedure: Afterwards for each keyword (not only for the new) found on the page to be indexed, the script needs to build up the keyword / link relationship and store it into the database. Beside these steps, the script needs to store the complete full text of the new page into its database, find new links and store them also into the database, calculate the MD5 checksum and store it etc.

Overall the index procedure will take some more time than the duration of the time slice, granted by your hoster for a shared hosting server. 200 sites are not a limit for a search engine like Sphider. Using Sphider-plus I indexed
25.206 sites + 324.595 page links + 1.260.698 keywords + 169.251 media links
without touching any limit.

Re: How Sphider Uses Resources
September 25, 2011 06:36AM
There is problem of Mysql it fine for not big sites and not intensive query. When u need make huge query and store large data in db or many small query at same time u need to use postgresql.

Edited 1 time(s). Last edit at 09/25/2011 06:43AM by SkyRanger.
Sorry, only registered users may post in this forum.

Click here to login