Welcome! Log In Create A New Profile

Advanced

Index only new sites

Posted by Tec 
Tec
Index only new sites
June 25, 2007 08:07PM
When adding new sites it could be helpful not to reindex the whole database but only these new sites. So here is an example how to implement this goody:


In admin/admin.php search for:
<li><a href='spider.php?all=1'>Reindex all</a><br><br></li>

Above that row include as new row:
<li><a href='spider.php?all=2'>Index only the new</a><br><br></li>



In admin/spider.php search for:

if ($all == 1) {
index_all();
}

behind that row include:

if ($all == 2) {
index_new();
}



At the end of the same file include:

function index_new() {
global $mysql_table_prefix;
print "<b><br><center>Now indexing all new URL's</center></b><br>\n";

$result=mysql_query("select url, indexdate, spider_depth, required, disallowed, can_leave_domain from ".$mysql_table_prefix."sites"winking smiley;
echo mysql_error();
while ($row=mysql_fetch_row($result)) {
$url = $row[0];
$indexdate = $row[1];
$depth = $row[2];
$include = $row[3];
$not_include = $row[4];
$can_leave_domain = $row[5];
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}

if ($indexdate == '') {
print "<br>&nbsp;&nbsp;Indexing $url<br>";
index_site($url, 1, $depth, $soption, $include, $not_include, $can_leave_domain);
}
}
print "<br><br><center><b>Indexing finished !</b></center><br>";

}


Ready. In admin section "Sites" now you will find a new item called " Index only the new". Use this to do an index after you added new sites.

Tec



Edited 1 time(s). Last edit at 06/25/2007 08:15PM by Tec.
rec
Re: Index only new sites
June 25, 2007 11:12PM
Tec

Welcome back; I hope that Ando keeps his word
[www.sphider.eu]

I have used sucessfully the above solution to index without reindex

and suggest some readers to consider

"above" and "behind" as "after"="below"="down", say


Example (admin.php)

if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}


Example (spider.php)

if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}
Tec
Re: Index only new sites
June 26, 2007 01:25AM
Sorry, but I really meant above and behind

The Sage's English Dictionary and Thesaurus:

ABOVE
> Synonym: ahead, earlier, in front.
> Antonym: behind, after.
- I had known her before.
- As I said before.
- He called me the day before but your call had come even earlier.
- With the cross of Jesus marching on before.
- That actor was famous before I was born.
- Be up and ready to go before sunrise if you are serious about bird watching.
- I place honesty before all else.
- It is ten minutes before three. It is time to go.
- I was finally before the man but could not articulate a word.

Adverb
> Synonym: ahead, earlier, in front.
- I had known her before.
- As I said before.
- He called me the day before but your call had come even earlier.
- Her parents had died four years earlier.
- I mentioned that problem earlier.
- I see the lights of a town ahead.
- The road ahead is foggy.
- Staring straight ahead.
- We couldn't see over the heads of the people in front.
- With the cross of Jesus marching on before.

Earlier in time; previously.
> Synonym: earlier.
- I had known her before.
- As I said before.
- He called me the day before but your call had come even earlier.
- Her parents had died four years earlier.
- I mentioned that problem earlier.

At or in the front.
> Synonym: ahead, in front.
- I see the lights of a town ahead.
- The road ahead is foggy.
- Staring straight ahead.
- We couldn't see over the heads of the people in front.
- With the cross of Jesus marching on before.

Conjunction
> Antonym: after.
- That actor was famous before I was born.

Subordinates one sentence to another and emphasizes the antecedent event in time.
> Antonym: after.
- That actor was famous before I was born.

Preposition
> Antonym: after.
- Be up and ready to go before sunrise if you are serious about bird watching.
- I place honesty before all else.
- It is ten minutes before three. It is time to go.
- I was finally before the man but could not articulate a word.

Preceding in time or space.
> Antonym: after.
- Be up and ready to go before sunrise if you are serious about bird watching.

Previous in position, order, rank, or importance.
> Antonym: after.
- I place honesty before all else.

When telling time, ahead of the hour.
> Antonym: after.
- It is ten minutes before three. It is time to go.

In the presence of.
- I was finally before the man but could not articulate a word.


___________________________________________________________


BEHIND
> Antonym: above.
> Synonym: at a lower place, beneath, down the stairs, downstairs, infra, on a lower floor, to a lower place, under.
- See below.
- My bunk is below hers.
- Wear two coats, it is below freezing.
- It is below me to clean toilets.

Adverb
> Antonym: above.
> Synonym: at a lower place, beneath, down the stairs, downstairs, infra, on a lower floor, to a lower place, under.
- See below.
- The tenants live downstairs.
- Vide infra.
- See under for further discussion.

At a later place.
> Antonym: above.
- See below.

In or to a place that is lower.
> Antonym: above.
> Synonym: at a lower place, beneath, to a lower place.

On a floor below.
> Synonym: down the stairs, downstairs, on a lower floor.
- The tenants live downstairs.

(in writing) See below.
> Synonym: infra.
- Vide infra.

Further down.
> Synonym: under.
- See under for further discussion.

Preposition
> Antonym: above.
> Synonym: beneath.
- My bunk is below hers.
- Wear two coats, it is below freezing.
- The cat often hides beneath the stairs.
- She feels like a princess and thinks we are all beneath her.
- It is below me to clean toilets.

Lower than, beneath, under.
> Antonym: above.
- My bunk is below hers.

Smaller quantity, inferior quality, or lesser degree.
> Synonym: beneath.
> Antonym: above.
- Wear two coats, it is below freezing.
- The cat often hides beneath the stairs.

Unworthy of consideration.
> Synonym: beneath.
- She feels like a princess and thinks we are all beneath her.
- It is below me to clean toilets.


Tec



Edited 1 time(s). Last edit at 06/26/2007 01:27AM by Tec.
rec
Re: Index only new sites
June 26, 2007 01:50AM
Tec

I just wrote to some who could need my explanation.

I believe that we all thank your latest writing. Please Keep giving your technical advice (linguistics is ALSO a part of it).

In which concerns some readers who are in a hurry to put the script working, even when they just know a bit of english, I expect that I have been usefull to some readers (like I wrote).
Tec
Re: Index only new sites
June 26, 2007 09:43AM
Have a look at your last posting:

----------------------->>>>
Example (admin.php)

if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}


Example (spider.php)

if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}

<<<<<<-------------

You really believe this is helpful to new bees?
Perhaps you were in hurry, but

if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}

is not (and should not become) part of admin.php

Tec
rec
Re: Index only new sites
June 26, 2007 03:53PM
"Lapsus Calamis" that you can confirm in the original Tec's Contribution, I should have writen in my latest post:

Example (admin.php)

<li><a href='spider.php?all=1'>Reindex all</a><br><br></li>
<li><a href='spider.php?all=2'>Index only the new</a><br><br></li>


I also remember that I tried sucessfully Tec's goodie.
Re: Index only new sites
July 22, 2007 07:09AM
would it be possible to do this from the command line, I tried php spider.php -all 2, but it doesnt work
Tec
Re: Index only new sites
July 25, 2007 11:43PM
mike171562:

For 3 days I imploringly hoped someone else would answer. Because I have no problem to use .../admin/spider.php?all=2 in order to index only the new sites on a Windows XP/SP2 system with XAMP server.

Sorry for your trouble. Still hoping for anyone else who could follow your issue.

Tec



Edited 1 time(s). Last edit at 07/25/2007 11:57PM by Tec.
Anonymous User
Re: Index only new sites
July 26, 2007 02:45AM
In Control Panel/System, people can create a path for php/Shpider, so that you can also operate from the command line, with the rules listed in install.txt. To do index only the new sites, it is necessary to give a few aditional details to Tec's excel script about this matter. That would also mean a more complete Sphider, if Ando accepts it.

"...4. Using the indexer from commandline

It is possible to spider webpages from the command line, using the syntax:

php spider.php <options>

where <options> are

-all Reindex everything in the database
-u <url> Set the url to index
-f Set indexing depth to full (unlimited depth)
-d <num> Set indexing depth to <num>
-l Allow spider to leave the initial domain
-r Set spider to reindex a site
-m <string> Set the string(s) that an url must include (use \n as a delimiter between multiple strings)
-n <string> Set the string(s) that an url must not include (use \n as a delimiter between multiple strings)


For example, for spidering and indexing [www.domain.com] to depth 2, use
php spider.php -u [www.domain.com] -d 2

If you want to reindex the same url, use
php spider.php -u [www.domain.com] -r..."
Re: Index only new sites
July 26, 2007 04:37PM
thank you recortes, but I was referring to using Tec's modifcation to do this, when I try what you suggested tec from the admin folder, i get this. I am using a linux Freebsd system with mysql and php5. i can continue to use lynx from the shell, but i was hoping for a better way, thanks for your responses.


# php spider.php?all=2
Could not open input file: spider.php?all=2

#this works
# php spider.php -all
php spider.php -all
Spidering [jdk.dev.java.net]
1. Retrieving: [jdk.dev.java.net] at 08:29:08.

The problem is if the spider is interrupted for any reason, I have to start over reindexing all and with 30,000 sites to spider, this takes some time.
rec
Re: Index only new sites
September 19, 2007 09:44PM
This is a master piece that, as other from Tec, should be distributed along Sphider. It seems to me that Tec's contributions aren't enough related in the presentations of each version of Sphider. I am not saying nothing about Ando's ethics. I am just inviting him to be more generous with his main participants (I just try to bring some dynamics).
Re: Index only new sites
November 01, 2007 12:12PM
Does this Reindex All Sites, but only gets information from the new pages, that are not in the database??

Thanks
Tec
Re: Index only new sites
November 01, 2007 02:56PM
This mod will not reindex anything. It will just index (for the first time) all the sites marked as "Not indexed" in Sphiders admin Sites interface.

Tec
Re: Index only new sites
September 25, 2009 02:12PM
I'm trying to add onto this, by creating a "Reindex if older than x days".

To do this, I followed the steps above with some modifications as follows:

In admin/admin.php, I included:
<li><a href='spider.php?all=3'>Reindex old</a><br /></li>

In admin/spider.php, I changed:

if ($all == 1) {
index_all();
} else {

To:
if ($all == 1) {
index_all();
}
if ($all == 2) {
index_new();
}
if ($all == 3) {
index_old();
} else {

And in the same file, at the bottom, I included:
function index_old() {
global $mysql_table_prefix;
print "<b><br><center>Now indexing all URL's over a week old</center></b><br>\n";

$result=mysql_query("select url, indexdate, spider_depth, required, disallowed, can_leave_domain from ".$mysql_table_prefix."sites"winking smiley;
echo mysql_error();
while ($row=mysql_fetch_row($result)) {
$url = $row[0];
$indexdate = $row[1];
$depth = $row[2];
$include = $row[3];
$not_include = $row[4];
$can_leave_domain = $row[5];
$today = date("Y-m-d"winking smiley;
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}

if ($today - $indexdate > '7') {
print "<br>&nbsp;&nbsp;Indexing $url<br>";
index_site($url, 1, $depth, $soption, $include, $not_include, $can_leave_domain);
}
}
print "<br><br><center><b>Indexing finished !</b></center><br>";

}

My intention was to have it index any url's greater than 7 days old (i.e. if ($today - $indexdate > '7')), however it doesn't index anything.

I think it's probably just a small error in the code, but if someone can give any insight that would be great!

Thanks!
Re: Index only new sites
September 27, 2009 11:06PM
Thank you, scd1982.

grinning smiley
Re: Index only new sites
September 28, 2009 02:23PM
celsojr100 Wrote:
-------------------------------------------------------
> Thank you, scd1982.
>
> grinning smiley


Did the code I posted work for you?

It doesn't seem to be working for me, but I think there's an error in the way it determines if the site should be re-indexed, as it starts to run (i.e. displays the text "Now indexing all URL's over a week old", then immediately displays "Indexing finished !". I have plenty of sites over a week old for it to run, but it doesn't seem to work correctly.

The only thing I can think of is that it's trying to find new sites that haven't been indexed yet, and that are over a week old, as opposed to re-indexing sites that have already been indexed. If so, I don't know how to change this behavior. Anyone who might be able to provide some help would be appreciated.
Re: Index only new sites
September 29, 2009 01:38AM
$today variable is not declared.

On top of script, declare $today as echo date("d/m/Y"winking smiley;

...

$today = echo date("m/d/y"winking smiley;
Re: Index only new sites
September 29, 2009 09:36PM
So you're saying the code should look like this:

function index_old() {
$today = echo date("Y-m-d"winking smiley;
global $mysql_table_prefix;
print "<b><br><center>Now indexing all URL's over a week old</center></b><br>\n";

$result=mysql_query("select url, indexdate, spider_depth, required, disallowed, can_leave_domain from ".$mysql_table_prefix."sites"winking smiley;
echo mysql_error();
while ($row=mysql_fetch_row($result)) {
$url = $row[0];
$indexdate = $row[1];
$depth = $row[2];
$include = $row[3];
$not_include = $row[4];
$can_leave_domain = $row[5];
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}

if ($today - $indexdate > 7) {
print "<br>&nbsp;&nbsp;Indexing $url<br>";
index_site($url, 1, $depth, $soption, $include, $not_include, $can_leave_domain);
}
}
print "<br><br><center><b>Indexing finished !</b></center><br>";

}

If so, it's still not working for me, and instead I'm getting the following error:
Parse error: syntax error, unexpected T_ECHO in /admin/spider.php on line 667

I'm still not sure what I'm missing.
Sorry, only registered users may post in this forum.

Click here to login