Welcome! Log In Create A New Profile


Urls not retrieved if lastmod doesn't exist

Posted by flatemp 
Urls not retrieved if lastmod doesn't exist
May 10, 2010 02:32PM
Hi everyone,

i'm quite new on sphider, and i'm doing tests on sphider-plus for a week.
I noticed a problem in version 2.2, but perhaps it has been fixed in 2.3
When it retrieves the file sitemap.xml of a site, if the lastmod field of an url doesn't exist, then the url isn't added to the url list to index. And if we look on the internet to the format of a sitemap.xml, it says that this field isn't required.

Therefore, there are urls which are skipped, but that could be indexed.

So here is my mod to take in account url without a lastmod date, for sphider-plus 2.2.
in admin/spiderfuncs.php, find lastmod occurence.
You should find around that a foreach statement :

foreach($s_map as $url) {
$the_url = str_replace("&","&",$url->loc);
$lastmod = strtotime($url->lastmod); // get lastmod date only for this page from sitemap
//FLA - 10/05/2010 - force url indexation because lastmod doesn't exist (this field is not mandatory)
if ($lastmod == '') //FLA - 10/05/2010 - force url indexation
$links[] =($url->loc); //FLA - 10/05/2010 - force url indexation
else{ //FLA - 10/05/2010 - force url indexation
$res=mysql_query("select indexdate from ".$mysql_table_prefix."links where url like '%$the_url%'"winking smiley;
$num_rows = mysql_num_rows($res); // do we already know this link?
$indexdate = 0;
if ($num_rows > 0) $indexdate = strtotime(mysql_result($res,"indexdate"winking smiley);
$new = $lastmod - $indexdate;
if ($new > '0') $links[] =($url->loc); // add new link only if date from sitemap.xml is newer than date of last index
} //FLA - 10/05/2010 - force url indexation

Just add the lines in your code where i have put something like //FLA - 10/05/2010, and it should be fine.
What i have done is :
if lastmod is empty, then add the url to url list, else do the same thing as before the fix.

Sorry, only registered users may post in this forum.

Click here to login