For my application I wanted associate the original link with the keywords even if the page redirects to another. Also, I wanted to index the original requested url (but with the content after any redirects for the that originally requested page) but still have the maxlevel set to 0 in order to not index any other pages on the website. I first thought about trying this earlier posted mod [
www.sphider.eu]. But that was not a very accurate way to do it and from the age of the post probably was before some new modifications that have been made since to the base code.
Anyway, this was my solution. It simply skips adding the redirected url to the temp table and instead immediately checks the url_status until it finds a non-redirecting response. Then continues with the method.
/********* Original code from admin/spider.php, inside the index_url function *********/
$thislevel = $level - 1;
if (strstr($url_status['state'], "Relocation" )) {
$url = eregi_replace(" ", "", url_purify($url_status['path'], $url, $can_leave_domain));
echo "New Url: $url";
if ($url <> '') {
$result = mysql_query("select link from ".$mysql_table_prefix."temp where link='$url' && id = '$sessid'"
;
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows == 0) {
mysql_query ("insert into ".$mysql_table_prefix."temp (link, level, id) values ('$url', '$level', '$sessid')"
;
echo mysql_error();
}
}
$url_status['state'] == "redirected";
}
/********* My new code to replace the above *********/
$thislevel = $level - 1;
$max_redirects = 10;
$redirects = 0;
while(strstr($url_status['state'], "Relocation" ) && $redirects < $max_redirects) {
$url = eregi_replace(" ", "", url_purify($url_status['path'], $url, $can_leave_domain));
$url_status = url_status($url);
// don't add it to the temp table anymore, just check the status and continue
/*if ($url <> '') {
$result = mysql_query("select link from ".$mysql_table_prefix."temp where link='$url' && id = '$sessid'"
;
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows == 0) {
mysql_query ("insert into ".$mysql_table_prefix."temp (link, level, id) values ('$url', '$level', '$sessid')"
;
echo mysql_error();
}
}*/
$redirects++;
}