Sphider - A PHP search engine
1. Unpack the files, and copy them to the server, for example to
/home/youruser/public_html/sphider (later referred to as [path_of_sphider])
2. In the server, create a database in MySQL to hold Sphider data.
a) at command prompt type (to log into MySQL):
mysql -u <your username> -p
Enter your password when prompted.
b) in MySQL, type:
CREATE DATABASE sphider;
Of course you can use some other name for database instead of sphider.
c) Use exit to exit MySQL.
3. In settings directory, edit database.php file and change $database, $mysql_user, $mysql_password and $mysql_host to correct values (if you dont know what $mysql_host should be, it should probably stay as it is - 'localhost').
4. Open install.php script (admin directory) in your browser, which will create the tables necessary for Sphider to operate.
Alternatively, the tables can be created by hand using tables.sql script given in the sql directory of the Sphider distribution. In the prompt, type
mysql -u <your username> -p sphider_db < [path_of_sphider]/sql/tables.sql
5. In admin directory, edit auth.php to change the administrator user name and password (default values are 'admin' and 'admin').
6. Open admin/admin.php in browser and start indexing.
7. search.php is the default search page.
To depth: Indexes to a given depth, where depth means how many "clicks" away the page can be from the starting page. Depth 0 means that only the starting page is indexed, depth 1 indexes the starting page and all the pages linked from it etc.
Reindex: By checking this checkbox, indexing is forced even if the page already has been indexed.
Spider can leave domain : By default, Sphider never leaves a given domain, so that links from domain.com pointing to domain2.com are not followed. By checking this option Sphider can leave the domain, however in this case its highly advisable to define proper must include / must not include string lists to prevent the spider from going too far.
Must include / must not include: See here for an explanation.
CustomizingIf you want to change the default behaviour of Sphider, you can do this either through the admin interface, or by directly editing conf.php in settings directory.
To change the look of the search page to fit your site, modify or add a template in the templates directory. It should be enough to modify the search.css file and header and footer templates (header.html and footer.html). Heavier modifications can be made through editing the rest of template files.
The list of file types that are not checked for indexing are given in admin/ext.txt. The list of common words that are not indexed are given in include/common.txt.
php spider.php <options>
where <options> are
|-all||Reindex everything in the database|
|-u <url>||Set the url to index|
|-f||Set indexing depth to full (unlimited depth)|
|-d <num>||Set indexing depth to <num>|
|-l||Allow spider to leave the initial domain|
|-r||Set spider to reindex a site|
|-m <string>||Set the string(s) that an url must include (use \n as a delimiter between multiple strings)|
|-n <string>||Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)|
For example, for spidering and indexing http://www.domain.com/test.html to depth 2, use
php spider.php -u http://www.domain.com/test.html -d 2
If you want to reindex the same url, use
php spider.php -u http://www.domain.com/test.html -r
pdftotext and catdoc and set there location(path) in conf.php (note that under Windows, you should not use spaces in defining the executable's path). Additionally, in admin section, check the Index pdf and Index doc boxes (alternatively, set $index_pdf and $index_doc parameters to 1 in conf.php).