Welcome! Log In Create A New Profile

Advanced

*RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)

Posted by cladiron 
*RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
March 12, 2012 07:27AM
I have posted my version edits here at http://sourceforge.net/projects/sphidercomunity/ for downloading.
Any issues with the Sphider-CV should be posted in this thread.
In this version you will have to manually add your links to the scripts.
I am working on a way to automate this process. This has to be made so the links are viewable before pressing the button to index them. Just in case you need to remove sites that you do not want in your search.


Excellent way to populate your search engine with little effort.
Using scripts, you shouldn't have to worry about your site timing out when running the indexer.
You will not be able to view each line yet, but that is a work in progress among a few other ideas.
You will be able to view the status area in the admin section and watch the links and keywords increase.


I'm not sure how many people may know about this or have give it much thought, but i found it to be a life saver.
Due to how long sites can take to index, i found i lost alot of time when the indexer would finish while i was asleep.
So i would miss hours that it could have been running.

This little script can keep it indexing for days, depending how many you add to the script.

In this little TUT i will explain how to create a script to place all your website links in for Indexing.

Create an .sh file called what ever you like. I will call mine "run_indexer_depth.sh"
Now place the code below inside it, replacing the URL's and indexing depth.
You can place as many web-URLs as you want in the file.
(The # sign is a comment, and means that line will be skipped.)
Even tho the line is not processed, it does still show in the console if viewing it. If you rather not have the messages show in the console, you CAN remove the lines that start with a # sign.

#Usage: php spider.php <options>
#
#Options:
# -all            Reindex everything in the database
# -u <url>        Set url to index
# -f              Set indexing depth to full (unlimited depth)
# -d <num>        Set indexing depth to <num>
# -l              Allow spider to leave the initial domain
# -r              Set spider to reindex a site
# -m <string>     Set the string(s) that an url must include (use \n as a delimiter between multiple strings)
# -n <string>     Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)
# ----------------------------------------------------------------------------------------------
php spider.php -u http://blahhhh.org/forums -d 5;
php spider.php -u http://blahhhhhh.net/forums -d 5;
php spider.php -u http://forums.blahhhhhhee.com -d 5;
php spider.php -u http://blahhhrrhhh.com/forums -d 5;
#php spider.php -u http://blahhhhhqqh.com/forums -d 5;
#php spider.php -u http://blahhhhhddh.com/forums -d 5;
php spider.php -u http://blahvvhhhhh.com/forums -d 5;
php spider.php -u http://blahhnnhhhh.com/forums -d 5;
exit;

Place your newly created run_indexer_depth.sh file in the same folder as spider.php

Chown the .sh file to 755 (i use FTP to change the permissions)
Now in SSH navigate to the .sh file and exec it.
Use screen so you can close out SSH when you want.


With screen Example:
cd /home/sites/public_html/admin/
screen ./run_indexer_depth.sh
To close out the screen without closing out the indexer.
While viewing the scan of the indexer press these:
Ctrl+A+D


Without using screen Example:
(if you start it this way, you must keep the SSH window open that is running the indexer. If you close it, the indexer WILL stop)
cd /home/sites/public_html/admin/
./run_indexer_depth.sh

This can also be used to Reindex your sites.
Example below.
#Usage: php spider.php <options>
#
#Options:
# -all            Reindex everything in the database
# -u <url>        Set url to index
# -f              Set indexing depth to full (unlimited depth)
# -d <num>        Set indexing depth to <num>
# -l              Allow spider to leave the initial domain
# -r              Set spider to reindex a site
# -m <string>     Set the string(s) that an url must include (use \n as a delimiter between multiple strings)
# -n <string>     Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)
# ----------------------------------------------------------------------------------------------
php spider.php -u http://blahhhh.org/forums -r;
php spider.php -u http://blahhhhhh.net/forums -r;
php spider.php -u http://forums.blahhhhhhee.com -r;
php spider.php -u http://blahhhrrhhh.com/forums -r;
#php spider.php -u http://blahhhhhqqh.com/forums -r;
#php spider.php -u http://blahhhhhddh.com/forums -r;
php spider.php -u http://blahvvhhhhh.com/forums -r;
php spider.php -u http://blahhnnhhhh.com/forums -r;
exit;

(Must be ROOT or a SUDO-user)
To install screen:
CENTOS:
yum install screen

UBUNTU:
apt-get install screen

CRON
Now this file can be setup to run as a Cron.
Things to consider when setting up the Cron

How large are the sites ?
Depth your going to index.
How many links you add to the .sh script.



Edited 8 time(s). Last edit at 08/10/2012 09:57PM by cladiron.
Re: Howto: Index using a script
March 16, 2012 09:53AM
Just to update this topic, i have made some adjustments in my admin.php, and added a few other files.
basicly i have got rid of the tabs and converted it to dropdown menus.

What i have changed:
Converted the tabs in admin.php to dropdown menus.
Added a couple of custom links to exec other scipts if needed.
Added 8 links in 1 of the dropdowns that execute's a script file. In the script file you can add as many commandline index searches as you like. ( i have tested it with 10 links in 4 groups, then tested with 5 links in 8 groups)
I'll get to explaining groups below.

Another example:
If you have 16 links to index, it is faster to index 2 links in each group (there are 8 groups, 2x8=16). So put 2 links in each group and then click on each link 1 at a time and close out the verifcation window for each.
This will index 1 site in each group at the same time. Once the first site is finished to the depth you set, it will move to the next.

*In my optition, running more indexes at a time is faster than placing them all in 1 file as i showed in the TUT in the first post.

About the Groups:

In 1 of the dropdown menus, there are 8 different buttons/links. Each button/link is called (Group1, Group2, Group3, ect..)
Each Group executes a script that you can place your links in that need indexing.

[IMG]http://img831.imageshack.us/img831/88/newindexadmintheme.jpg[/IMG]

I prefer the blue bar ver the brownish/black like the drop down shows.
There will be images to convert the bar to the same as the dropdown if you like.
Note the image below the blue bar is an animated image. Images not required.

Edited to show servers stats.
This is the average or less, but i seen 2 spikes on the CPU that got over 5%.
1 was at 25%, the other was at 10%


Uptime: 1 days, 20 hours, 10 minutes
Tasks: 211 total,   1 running, 210 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.1%us,  0.3%sy,  0.0%ni, 95.5%id,  0.9%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   3960652k total,  2700808k used,  1259844k free,   185548k buffers
Swap:  5996536k total,        0k used,  5996536k free,  1447388k cached
-sh-3.2$ screen -r 
There are several suitable screens on:
        31448.script    (Detached)
        31452.script    (Detached)
        31469.script    (Detached)
        31427.script    (Detached)
        31436.script    (Detached)
        31461.script    (Detached)
        31441.script    (Detached)
        31422.script    (Detached)



Edited 3 time(s). Last edit at 03/17/2012 01:30AM by cladiron.
Re: Howto: Index using a script (Theme MOD and multi-indexing coming soon)
March 17, 2012 08:47PM
I need a few Testers please.


I think i have this at a stage where i can share it, but before i do i rather have a few testers to make sure there are no bugs.

If you would like to test what i have done so far, please PM me.

Once i get a few testers saying "OK" , i'll post it on the forums.

Built into the project:
Some files will be over wrote such as the admin.php, search.php, and the dark theme.

#What got me to work on this ?
[www.sphider.eu]   (thanks, inmotion)

#For the Dropdown bar and bar images. (to keep the bar blue, do not use the images)
[www.freecssmenus.co.uk]

#[MOD] Search, Tag, Word Cloud
[www.sphider.eu]

#Animated Collapsible DIV v2.4  (Thanks WILLY for the idea.  [www.sphider.eu] )
[dynamicdrive.com]

#Add url submit form
[www.sphider.eu]

#Creating better search suggestions with Sphider  (Thanks to Matt's post here.   [www.sphider.eu] )
[www.mdj.us]

#Easy PHP Contact Form
[www.easyphpcontactform.com]



Edited 1 time(s). Last edit at 03/17/2012 09:29PM by cladiron.
Re: Howto: Index using a script (Theme MOD and multi-indexing coming soon)
March 29, 2012 02:59PM
Here is a sneek-peek at how the engine is going.
I should have something posted within the next few days.

Updated Images and progress located here.
http://sourceforge.net/projects/sphidercomunity
Re: Howto: Index using a script (Theme MOD and multi-indexing coming soon)
April 20, 2012 07:55AM
I have 1 tester, anyone else ?
PM me.
I am going to test this out. I am very tired of the timeouts that happens when indexing large sites. Soon I will be setting up on my own severs. This is insightful option. I will let you know if I am able to get it integrated.

Can you think of special settings other than the permission settings?
Re: Howto: Index using a script (Theme MOD and multi-indexing coming soon)
May 14, 2012 09:13AM
Sry for not replying sooner.
I am going back over my edits and changing it some.
It will still have the same basic functionality as i mentioned before.
I am changing the way it looks over all, and thus the reason for not releasing it yet.

You can still test out the script feature i mentioned in the first post.
I will try to have this ready in a few days.
Re: Howto: Index using a script (Theme MOD and multi-indexing coming soon)
May 17, 2012 04:39PM
A few more image teasers.
[IMG]http://i45.tinypic.com/6eeko3.jpg[/IMG]

[IMG]http://i46.tinypic.com/96hi6u.jpg[/IMG]

[IMG]http://i46.tinypic.com/2akabtj.jpg[/IMG]
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
May 24, 2012 03:31AM
Any feedback on the install ?
any issues ?
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
July 25, 2012 06:51AM
Are there any issues? I want to know before I install it.
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
August 10, 2012 09:37PM
hello tabscorbet,
I have not heard of any issues, and there are 61 downloads on it at sourceforge.
http://sourceforge.net/projects/sphidercomunity/?source=directory

Todo:
I still haven't converted the files over for it to work on a Windows Platform.
I have to install windows test server first.

If you want to test it first, make a copy of your Database and rename it, then create a subfolder in your public_html.
Now install Sphider into that folder using the database you backed up and renamed. Now install the community version over it.

Goodluck.
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
September 30, 2012 01:30AM
is this code correct if I want to reindex all 600 websites in my database, without writing all the urls below?

#Usage: php spider.php <options>
#
#Options:
# -all            Reindex everything in the database
# -u <url>        Set url to index
# -f              Set indexing depth to full (unlimited depth)
# -d <num>        Set indexing depth to <num>
# -l              Allow spider to leave the initial domain
# -r              Set spider to reindex a site
# -m <string>     Set the string(s) that an url must include (use \n as a delimiter between multiple strings)
# -n <string>     Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)
# ----------------------------------------------------------------------------------------------
php spider.php -all;
exit;

thanks, T1000
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
October 17, 2012 03:15AM
Sorry for the late reply, i seem to not be getting mail from the site when a post is made.

Yes your example is correct.
Re: *RELEASED* Howto: Index using a script (Theme MOD and Mass-Indexing)
March 24, 2013 04:47AM
Just to update this thread, i am working on another version.

The new version will allow you to add/edit multiple links in the admin section without needing to SSH into your server.
Added a CPU page to monitor TOP status among a few other things.
Reindex 1 group or multiple groups at once, or reindex ALL using the factory reindex feature.

I'm still working on a way to monitor each link that is being indexed.

http://i45.tinypic.com/24p9vkh.jpg
http://i50.tinypic.com/if3or4.jpg
http://i49.tinypic.com/e5fbph.jpg
http://i46.tinypic.com/2gt0uti.jpg
http://i46.tinypic.com/j5dz46.jpg



Edited 1 time(s). Last edit at 03/25/2013 05:04AM by cladiron.
Sorry, only registered users may post in this forum.

Click here to login