May 04, 2007 10:47PM
Allowed memory size of 33554432 bytes exhausted
Posted by: Andriko (IP Logged)
Date: May 04, 2007 10:45AM

things are actually easy to prevent this:

In Function get_temp_urls located in spider.php a sql gets executed. The results of this sql get stored in an Array named $tmp_urls.

Limit the SQL results to say 0,100 and your spider will work fine as the Array is kept small. Now, if you think the spider will stop after 100 URLs, then you are wrong, he continues way as long as there is something to spider (depending on your settings)!

The SQL I use reads:
$result = mysql_query("select link from ".$mysql_table_prefix."temp where id='$sessid' limit 0,10"winking smiley;

This also speeds up the overhead before the spider starts and limits queries against the database!
Try it and be surprised!

Posted by: fmpwizard (IP Logged)
Date: May 03, 2007 08:19AM

Would you like to show the title and the url shorter like in this URL ?

Web Developer site

edit the file /include/searchfuncs.php
near line 569

if ($title=='')
$title = $sph_messages["Untitled"];
//Start - show shorter title
$title = strlen($title) > 16 ? substr($title, 0, 10)."...".substr($title, -7):$title;
//End - show shorter title
$regs = Array();
foreach($words['hilight'] as $change) {
while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$title." ", $regs)) {
$title = eregi_replace($regs[1], "<b>".$regs[1]."</b>", $title);

while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$fulltxt." ", $regs)) {
$fulltxt = eregi_replace($regs[1], "<b>".$regs[1]."</b>", $fulltxt);
$url2 = $url;

//Code Added

if (strlen($url2) > 35) {
$url2 = substr($url2, 0, 19)."...".substr($url2, -15) ;

//Code Added

while (@eregi("[^\>](".$change."winking smiley[^\<]", $url2, $regs)) {
$url2 = eregi_replace($regs[1], "<b>".$regs[1]."</b>", $url2);
Automatically clean keywords?
Posted by: Tec (IP Logged)
Date: April 26, 2007 11:56PM

No, during re-indexing database keywords and links are not automatically cleaned.

To clean keywords from command line use:


To clean links from command line use:



Searching dates?
Posted by: huseurdaddy2001 (IP Logged)
Date: April 30, 2007 02:12AM


Sphider cannot search dates with forward slashes (/).

For example, if I search for 04/29/07, sphider finds nothing, even though I have sites with that date.

If I edit my sites and simply replace the forward slash (/) with a hyphen (-) and re-index, then sphider finds all sites if I search for 04-29-07.

Is it possible to modify sphider such that it returns sites with slashes in date as keywords?
Re: Searching dates?
Posted by: Tec (IP Logged)
Date: April 30, 2007 09:35AM

Open .../admin/sphiderfuncs.php

Search for:

$file = preg_replace("/[\*\^\+\?\\\.\[\]\^\$\|\{\)\(\}~!\"\/@#?$%&=`?;><:,]+/", " ", $file);

Delete this row and replace it with:

$file = preg_replace("/[\*\^\+\?\\\[\]\^\$\|\{\)\(\}~!\"@#?$%&=`?;><:,]+/", " ", $file);

This modification will let you search for 30/04/2007 and also for 30.04.2007



Integrate Spider results page
Posted by: ianmh (IP Logged)
Date: April 05, 2007 08:28PM

Thought I would share with everyone how I got this to work. If you are trying to embed the engine you can, and it is pretty easy. smiling smiley

Open commonfuncs.php and change the includes array like this.

$includes = array('./include', 'include', '../include', './search/include', 'search/include', '../search/include');

Leave the old paths in too because I think this keeps the admin working.

"search" being what I called the directory sphider is in. Then change $include_dir in search.php followed by all the other directories like $template_dir, $settings_dir etc.

This worked for me. smiling smiley

Ban folders from being indexed
Posted by: Tec (IP Logged)
Date: May 02, 2007 10:31AM

If you have access to the sites to be indexed, there is another easy possibility:
Create a file named robots.txt and place it into the root-folder of your site to be indexed.
Content of robots.txt:
User-agent: *
Disallow: /admin/
Disallow: /backup/
Disallow: /folder_xyz/
# End of file

Now all well educated crawlers like Sphider or Google will not index these folders.
For more details see [www.robotstxt.org]


Sphider slow
Posted by: Andriko (IP Logged)
Date: April 30, 2007 08:44PM

I actually do disagree with the rest of the posts! The memorylimit has nothing to do with your timeouts! PHP by default has a script timeout limit of 30 secs. Increasing this to 60 may improve stability! You may want to look for php.ini (location varies depending on the Lamp box you are using!
Increasing PHP memory_limit as stated in a prior message is tricky! If you have way more than 1 gig you may use the 256 MB, but think of the overall load and also the database overhead when designing your memory usage map!


Searching using numbers
Posted by: Tec (IP Logged)
Date: April 30, 2007 01:20AM

Okay, here is a more intelligent solution. It lets you search for one digit numbers and leave the "Minimum word length in order to be indexed" for all other queries as you prefer.

Open /admin/spiderfuncs.php

Search for

if (strlen($element) >= $min_word_length && preg_match($pattern, remove_accents($element)) && (@ $common[$element] <> 1)) {

Delete this row and replace it with the following rows:

$int_word_length = $min_word_length;
if(preg_match("/^[0-9]{1,60}$/", $element)) {
$int_word_length = '1';
if (strlen($element) >= $int_word_length && preg_match($pattern, remove_accents($element)) && (@ $common[$element] <> 1)) {


Prefered results
Posted by: jhodges (IP Logged)
Date: April 27, 2007 02:07AM


Could you do this?

Create three different sphiders, and rename the sphider.php (I think that is the file that does the lifting) differently for each one. Make sure you three different categories are in three different folders, and then write 3 different robots.xt for each folder. Have each robots.txt dissallow the other two spiders, so that each spider is only spider its corresponding file.
Then when a person searches, change the submit so that it submits the keyword to each of the three seperate spiders. Maybe create a page with 3 frames to show the seperate results, or maybe change code to run the same function three times to get three different sets of results on results page? (Sorry have not thought this all the way through, but I think this is a place to start.

indexing and searching text of languages with accents
Posted by: Tec (IP Logged)
Date: April 25, 2007 01:20AM

Independent of interface language selection, I have no problem to search for special characters and accents like ? ? ? ? ? etc.
Try the following two enhancements:

1. Change collation of the MySQL tables to utf8_bin

2. Modify of the suggest framework as described in 'Sphider Mods'

Therefore open the script


Search for:

print "new Array(" . implode(", ", $js_array) . "winking smiley";

and replace it with this:

print utf8_encode("new Array(" . implode(", ", $js_array) . "winking smiley"winking smiley;

Best success

Suggest Framework and problems with several letter strings
Posted by: Tec (IP Logged)
Date: March 26, 2007 03:48PM

Suggest Framework and problems with several letter strings.
Ensure that some words like Blume, Frivolit?, Horstmar and Occhi are part of your searchkeywords database. There are several others like these . . .
As admin, enable 'Enable Sphider Suggest' and 'Search for suggests in keywords'.
Now search for: blu or fri etc.
After typing the third letter my browser crashes with JavaScript Error.

Is this my personal problem or could this behavior been confirmed by anyone else?

Please do not offer a solution like 'Switch OFF error reporting in your browser', because the problems will remain.

My work around for this bug (?):
Open /include/js_suggest/suggest.php
Incude at row 25

if (ereg('blu',($_GET['q']))) // and all the other letter strings
$suggest_phrases = false;
$suggest_keywords = false;

To get no suggestion is not a solution, it's just a work around
Any SOLUTION would be appreciated.
Thanks in advance

I got the result by using utf8_bin collation for database

Framework and problems with German Umlaute
Posted by: Tec (IP Logged)
Date: March 26, 2007 03:44PM

Suggest Framework and problems with German Umlaute
More than one year ago, when Sphider v. was in BETA status, this bug we already discussed in the old forum. Even in v.1.3.1.f I didn't recognize a solution for this, so with all respect to the previous work and as courteously as possible I permit myself to ask if there is (or will be) a solution for this well-known bug.
Not familiar with JavaScript I'm unable to solve this problem and need a solution from someone more intelligent than me.
My work around:
Open /include/js_suggest/suggest.php

Incude at row 24
if (ereg("?|?|?|?|?|?",($_GET['q'])))
$suggest_phrases = false;
$suggest_keywords = false;

To get no suggestion is not a solution, it's just a work around
Any SOLUTION would be appreciated.
Thanks in advance
Anonymous User
Re: See a few solutions found in this forum
June 30, 2007 06:24PM
See some goodies from Tec

(Re)index after changes in admin settings ("Reindex with erase"winking smiley

Import url list with spider depth an categories

Index only new sites

See a goodies from searchMan

Simple Admin Search URL
Re: See a few solutions found in this forum
July 20, 2007 07:30PM
I have tried sucessfully the previous contributions about the "fonction showsites" and still have a little problem with searchman's javascript, maybe because of the configuration of my browsers. Sphider's contributers/adopters are very active (this isn´t all of it):

Deleting unwanted urls acessed when browsing an indexed page (level 2 urls)

Preparing a page break facility to select the number of sites displayed per page

Exporting current domains from database to a file by url, depth and category
Re: See a few solutions found in this forum
August 09, 2007 10:27AM
The next 3 url's can show the working modifications published in this forum / I presume that it is usefull to new and early adopters of Sphider (with due respect to everyone else):

from suchmaschine-bochum:

from searchman:

from Tec:
Re: See a few solutions found in this forum
August 12, 2007 01:11PM
these are some great addittions to the se script. Thanks for pointing them out
