Welcome! Log In Create A New Profile


Reindexing Sphider

Posted by hphziw 
Reindexing Sphider
February 16, 2010 04:23PM

My thanks and praises to Mr.Ando Saabas for creating Sphider.

It's free, I installed in quick time and I can index my site from the Admin page.
Reindexing it is a different story. Trying to get reindexing is not 'Free' because of the time spent. Before I go on, do I have to pay for this facility? If so who.

I've followed the instructions

"4. Using the indexer from commandline

It is possible to spider webpages from the command line, using the syntax:

php spider.php <options>

where <options> are

-all Reindex everything in the database
-u <url> Set the url to index
-f Set indexing depth to full (unlimited depth)
-d <num> Set indexing depth to <num>
-l Allow spider to leave the initial domain
-r Set spider to reindex a site
-m <string> Set the string(s) that an url must include (use \n as a delimiter between multiple strings)
-n <string> Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)

For example, for spidering and indexing http://www.domain.com/test.html to depth 2, use
php spider.php -u http://www.domain.com/test.html -d 2

If you want to reindex the same url, use
php spider.php -u http://www.domain.com/test.html -r "

But they do not work. When I've used "/usr/bin/php /home/********/public_html/sphider/admin/spider.php"

I get back two emails

1. Sphider has finished indexing at 10-02-16 08:14:01. Log saved into log/1002160814.html

2. X-Powered-By: PHP/5.2.9

Set-Cookie: PHPSESSID=1e8rcoep564nlga1jhhvidfh94; path=/

Expires: Thu, 19 Nov 1981 08:52:00 GMT

Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0

Pragma: no-cache

Content-type: text/html

Failed to parse address ""<html><head><LINK REL=STYLESHEET HREF="admin.css" TYPE="text/css"></head>
<body style="font-family:Verdana, Arial; font-size:12px">[Back to <a href="admin.php">admin</a>]<p><font size="+1">Spidering <b>/</b></font></p>
<br>Completed at 08:14:01.

And I've used other commands with a variety of options in a CRON job but it never reindexes my site.
Reindexing from admin always works beautifully so I'm very pleased with part.

My questions are, does Mr Saabas have any intention of expanding on the instructions given for Using the indexer from commandline. And, is a CRON job running on the command line in the first place.

Sphider is great but without the reindexing facility it is still great.
But less so.


Re: Reindexing Sphider
February 16, 2010 09:39PM
You don´t have to pay, but you are free to make a donation, more info here: http://www.sphider.eu/donate.php
You can pay clicking on the paypal button in that page.

To reindex using a cronjob you might find that using
cd /home/********//public_html/sphider/admin/; /usr/bin/php spider.php -u your_domain/ -r
works for you.

It took me a long time trying and failing to get this working and I finally got it working using the command above.
Re: Reindexing Sphider
February 17, 2010 03:16AM
Hi Williy,

Thanks for the suggestion but it is not working for me.
I get back the same two emails as before.
Do you think that the setup at my host might be restrictive or something.

Thanks anyway


Re: Reindexing Sphider
March 18, 2010 10:37AM
cd /home/********/public_html/sphider/admin/; /usr/bin/php spider.php -u your_domain/ -r

Works GREAT! Thank you!

Now for something fun - is there any way to LIMIT the amount of time the cron will run?

I would like to have the cron run for ONLY 120 seconds.

Any suggestions?

Re: Reindexing Sphider
March 18, 2010 12:19PM

I use the follow sitemap.xml to index my site and phpBB board.
I have a mod that dynamically generates a sitempa for my phpbb board so Sphider doesn´t have to follow every url it finds on the board, only the viewtopic pages...
It takes about 2 to 3 minutes to reindex the whole thing.
Re: Reindexing Sphider
October 05, 2010 01:19AM
is URL the full path ie: http://www.blah.com ?

Re: Reindexing Sphider
October 05, 2010 02:46AM
If you mean the <url> in the sitemap.xml the answer is yes, full urls, no relative urls.
Re: Reindexing Sphider
October 05, 2010 05:25AM
In the cron job syntax:
/usr/bin/php spider.php -u your_domain/ -r

is "your_domain/" with the http:// or without?
Re: Reindexing Sphider
October 05, 2010 05:43AM
And a second question, why isn't f required as a parameter in the example:

cd /home/********/public_html/sphider/admin/; /usr/bin/php spider.php -u your_domain/ -r

it would seem like f would be needed as well as -u and -r

Re: Reindexing Sphider
October 05, 2010 06:17AM
Here's my cron code (hosting is by DreamHost)

/usr/local/php5/bin/php /home/jillolkoski/aldebaranwebdesign.com/sphider/admin/spider.php -u http://aldebaranwebdesign.com/ -r -f

I know that the spider.php is being run because I put an echo command and DreamHost is set to email me that echo, and it's definitely running. But the indexing isn't happening - or at least, when I look inside the admin area, there is no log showing it has been run and Sphider isn't sending me an email like it does when I manually reindex.

Any help is appreciated, this is the very last step I need to complete before I'm able to recommend this application to my clients and help get you folks more donations. smiling smiley

Thanks in advance,
Sorry, only registered users may post in this forum.

Click here to login