Hi everyone,
i'm quite new on sphider, and i'm doing tests on sphider-plus for a week.
I noticed a problem in version 2.2, but perhaps it has been fixed in 2.3
When it retrieves the file sitemap.xml of a site, if the lastmod field of an url doesn't exist, then the url isn't added to the url list to index. And if we look on the internet to the format of a sitemap.xml, it says that this field isn't required.
Therefore, there are urls which are skipped, but that could be indexed.
So here is my mod to take in account url without a lastmod date, for sphider-plus 2.2.
in admin/spiderfuncs.php, find lastmod occurence.
You should find around that a foreach statement :
foreach($s_map as $url) {
$the_url = str_replace("&","&",$url->loc);
$lastmod = strtotime($url->lastmod); // get lastmod date only for this page from sitemap
//FLA - 10/05/2010 - force url indexation because lastmod doesn't exist (this field is not mandatory)
if ($lastmod == '') //FLA - 10/05/2010 - force url indexation
$links[] =($url->loc); //FLA - 10/05/2010 - force url indexation
else{ //FLA - 10/05/2010 - force url indexation
$res=mysql_query("select indexdate from ".$mysql_table_prefix."links where url like '%$the_url%'"

;
$num_rows = mysql_num_rows($res); // do we already know this link?
$indexdate = 0;
if ($num_rows > 0) $indexdate = strtotime(mysql_result($res,"indexdate"

);
$new = $lastmod - $indexdate;
if ($new > '0') $links[] =($url->loc); // add new link only if date from sitemap.xml is newer than date of last index
} //FLA - 10/05/2010 - force url indexation
}
Just add the lines in your code where i have put something like //FLA - 10/05/2010, and it should be fine.
What i have done is :
if lastmod is empty, then add the url to url list, else do the same thing as before the fix.
Bye,
Fabien.