Welcome! Log In Create A New Profile

Advanced

Sphider PHP5.3.0 and ereg function

Posted by einboubou 
Sphider PHP5.3.0 and ereg function
October 06, 2009 01:18PM
Hi,

i tried Sphider on a PHP5.3.0 server and i recieve this errors :
Deprecated: Function eregi_replace() is deprecated in C:\xampp\htdocs\s\include\searchfuncs.php on line 54
Deprecated: Function eregi_replace() is deprecated in C:\xampp\htdocs\s\include\searchfuncs.php on line 58
Deprecated: Function eregi() is deprecated in C:\xampp\htdocs\s\include\searchfuncs.php on line 62
Deprecated: Function eregi_replace() is deprecated in C:\xampp\htdocs\s\include\searchfuncs.php on line 71

Indeed all ereg functions are now deprecated with php5.3.0 : http://fr2.php.net/manual/fr/function.ereg.php

The solution that i found is to use mb_ereg functions : http://fr2.php.net/manual/fr/function.mb-ereg.php

I didn't made a lot of tests but it seems to be ok.

There is someone who supports this project ?
Re: Sphider PHP5.3.0 and ereg function
November 04, 2009 12:57AM
in ADMIN/spiderfuncs.php put this code (php5 compatible)

<?php
function getFileContents($url) {
global $user_agent;
$urlparts = parse_url($url);
$path = $urlparts['path'];
$host = $urlparts['host'];
if ($urlparts['query'] != ""winking smiley
$path .= "?".$urlparts['query'];
if (isset ($urlparts['port'])) {
$port = (int) $urlparts['port'];
} else
if ($urlparts['scheme'] == "http"winking smiley {
$port = 80;
} else
if ($urlparts['scheme'] == "https"winking smiley {
$port = 443;
}

if ($port == 80) {
$portq = "";
} else {
$portq = ":$port";
}

$all = "*/*";

$request = "GET $path HTTP/1.0\r\nHost: $host$portq\r\nAccept: $all\r\nUser-Agent: $user_agent\r\n\r\n";

$fsocket_timeout = 30;
if (substr($url, 0, 5) == "https"winking smiley {
$target = "ssl://".$host;
} else {
$target = $host;
}


$errno = 0;
$errstr = "";
print "siin";
$fp = @ fsockopen($target, $port, $errno, $errstr, $fsocket_timeout);

print $errstr;
if (!$fp) {
$contents['state'] = "NOHOST";
printConnectErrorReport($errstr);
return $contents;
} else {
if (!fputs($fp, $request)) {
$contents['state'] = "Cannot send request";
return $contents;
}
$data = null;
socket_set_timeout($fp, $fsocket_timeout);
do{
$status = socket_get_status($fp);
$data .= fgets($fp, 8192);
} while (!feof($fp) && !$status['timed_out']) ;

fclose($fp);
if ($status['timed_out'] == 1) {
$contents['state'] = "timeout";
} else
$contents['state'] = "ok";
$contents['file'] = substr($data, strpos($data, "\r\n\r\n"winking smiley + 4);
}
return $contents;
}

/*
check if file is available and in readable form
*/
function url_status($url) {
global $user_agent, $index_pdf, $index_doc, $index_xls, $index_ppt;
$urlparts = parse_url($url);
$path = $urlparts['path'];
$host = $urlparts['host'];
if (isset($urlparts['query']))
$path .= "?".$urlparts['query'];

if (isset ($urlparts['port'])) {
$port = (int) $urlparts['port'];
} else
if ($urlparts['scheme'] == "http"winking smiley {
$port = 80;
} else
if ($urlparts['scheme'] == "https"winking smiley {
$port = 443;
}

if ($port == 80) {
$portq = "";
} else {
$portq = ":$port";
}

$all = "*/*"; //just to prevent "comment effect" in get accept
$request = "HEAD $path HTTP/1.1\r\nHost: $host$portq\r\nAccept: $all\r\nUser-Agent: $user_agent\r\n\r\n";

if (substr($url, 0, 5) == "https"winking smiley {
$target = "ssl://".$host;
} else {
$target = $host;
}

$fsocket_timeout = 30;
$errno = 0;
$errstr = "";
$fp = fsockopen($target, $port, $errno, $errstr, $fsocket_timeout);
print $errstr;
$linkstate = "ok";
if (!$fp) {
$status['state'] = "NOHOST";
} else {
socket_set_timeout($fp, 30);
fputs($fp, $request);
$answer = fgets($fp, 4096);
$regs = Array ();
if (preg_match("/"."HTTP/[0-9.]+ (([0-9])[0-9]{2})"."/", $answer, $regs)) {
$httpcode = $regs[2];
$full_httpcode = $regs[1];

if ($httpcode <> 2 && $httpcode <> 3) {
$status['state'] = "Unreachable: http $full_httpcode";
$linkstate = "Unreachable";
}
}

if ($linkstate <> "Unreachable"winking smiley {
while ($answer) {
$answer = fgets($fp, 4096);

if (preg_match("/"."Location: *([^\n\r ]+)"."/", $answer, $regs) && $httpcode == 3 && $full_httpcode != 302) {
$status['path'] = $regs[1];
$status['state'] = "Relocation: http $full_httpcode";
fclose($fp);
return $status;
}

if (preg_match("/"."Last-Modified: *([a-z0-9,: ]+)"."/", $answer, $regs)) {
$status['date'] = $regs[1];
}

if (preg_match("/"."Content-Type:"."/", $answer)) {
$content = $answer;
$answer = '';
break;
}
}
$socket_status = socket_get_status($fp);
if (preg_match("/"."Content-Type: *([a-z/.-]*)"."/", $content, $regs)) {
if ($regs[1] == 'text/html' || $regs[1] == 'text/' || $regs[1] == 'text/plain') {
$status['content'] = 'text';
$status['state'] = 'ok';
} else if ($regs[1] == 'application/pdf' && $index_pdf == 1) {
$status['content'] = 'pdf';
$status['state'] = 'ok';
} else if (($regs[1] == 'application/msword' || $regs[1] == 'application/vnd.ms-word') && $index_doc == 1) {
$status['content'] = 'doc';
$status['state'] = 'ok';
} else if (($regs[1] == 'application/excel' || $regs[1] == 'application/vnd.ms-excel') && $index_xls == 1) {
$status['content'] = 'xls';
$status['state'] = 'ok';
} else if (($regs[1] == 'application/mspowerpoint' || $regs[1] == 'application/vnd.ms-powerpoint') && $index_ppt == 1) {
$status['content'] = 'ppt';
$status['state'] = 'ok';
} else {
$status['state'] = "Not text or html";
}

} else
if ($socket_status['timed_out'] == 1) {
$status['state'] = "Timed out (no reply from server)";

} else
$status['state'] = "Not text or html";

}
}
fclose($fp);
return $status;
}

/*
Read robots.txt file in the server, to find any disallowed files/folders
*/
function check_robot_txt($url) {
global $user_agent;
$urlparts = parse_url($url);
$url = 'http://'.$urlparts['host']."/robots.txt";

$url_status = url_status($url);
$omit = array ();

if ($url_status['state'] == "ok"winking smiley {
$robot = file($url);
if (!$robot) {
$contents = getFileContents($url);
$file = $contents['file'];
$robot = explode("\n", $file);
}

$regs = Array ();
$this_agent= "";
while (list ($id, $line) = each($robot)) {
if (eregi("^user-agent: *([^#]+) *", $line, $regs)) {
$this_agent = trim($regs[1]);
if ($this_agent == '*' || $this_agent == $user_agent)
$check = 1;
else
$check = 0;
}

if (eregi("disallow: *([^#]+)", $line, $regs) && $check == 1) {
$disallow_str = eregi_replace("[\n ]+", "", $regs[1]);
if (trim($disallow_str) != ""winking smiley {
$omit[] = $disallow_str;
} else {
if ($this_agent == '*' || $this_agent == $user_agent) {
return null;
}
}
}
}
}

return $omit;
}

/*
Remove the file part from an url (to build an url from an url and given relative path)
*/
function remove_file_from_url($url) {
$url_parts = parse_url($url);
$path = $url_parts['path'];

$regs = Array ();
if (preg_match('/([^\/]+)$/i', $path, $regs)) {
$file = $regs[1];
$check = $file.'$';
$path = preg_replace("/$check"."/i", "", $path);
}

if ($url_parts['port'] == 80 || $url_parts['port'] == ""winking smiley {
$portq = "";
} else {
$portq = ":".$url_parts['port'];
}

$url = $url_parts['scheme']."://".$url_parts['host'].$portq.$path;
return $url;
}

/*
Extract links from html
*/
function sphider_get_links($file, $url, $can_leave_domain, $base) {

$chunklist = array ();
// The base URL comes from either the meta tag or the current URL.
if (!empty($base)) {
$url = $base;
}

$links = array ();
$regs = Array ();
$checked_urls = Array();

preg_match_all("/href\s*=\s*[\'\"]?([+:%\/\?~=&;\\\(\),._a-zA-Z0-9-]*)(#[.a-zA-Z0-9-]*)?[\'\" ]?(\s*rel\s*=\s*[\'\"]?(nofollow)[\'\"]?)?/i", $file, $regs, PREG_SET_ORDER);
foreach ($regs as $val) {
if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
$links[] = $a;
}
$checked_urls[$val[1]] = 1;
}
}
preg_match_all("/(frame[^>]*src[[:blank:]]*)=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
foreach ($regs as $val) {
if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
$links[] = $a;
}
$checked_urls[$val[1]] = 1;
}
}
preg_match_all("/(window[.]location)[[:blank:]]*=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
foreach ($regs as $val) {
if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
$links[] = $a;
}
$checked_urls[$val[1]] = 1;
}
}
preg_match_all("/(http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;url)[[:blank:]]*=[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
foreach ($regs as $val) {
if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
$links[] = $a;
}
$checked_urls[$val[1]] = 1;
}
}

preg_match_all("/(window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?(([[a-z]{3,5}:\/\/(([.a-zA-Z0-9-])+(:[0-9]+)*))*([+:%\/?=&;\\\(\),._ a-zA-Z0-9-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?/i", $file, $regs, PREG_SET_ORDER);
foreach ($regs as $val) {
if ($checked_urls[$val[1]]!=1 && !isset ($val[4])) { //if nofollow is not set
if (($a = url_purify($val[1], $url, $can_leave_domain)) != '') {
$links[] = $a;
}
$checked_urls[$val[1]] = 1;
}
}

return $links;
}

/*
Function to build a unique word array from the text of a webpage, together with the count of each word
*/
function unique_array($arr) {
global $min_word_length;
global $common;
global $word_upper_bound;
global $index_numbers, $stem_words;

if ($stem_words == 1) {
$newarr = Array();
foreach ($arr as $val) {
$newarr[] = stem($val);
}
$arr = $newarr;
}
sort($arr);
reset($arr);
$newarr = array ();

$i = 0;
$counter = 1;
$element = current($arr);

if ($index_numbers == 1) {
$pattern = "/[a-z0-9]+/";
} else {
$pattern = "/[a-z]+/";
}

$regs = Array ();
for ($n = 0; $n < sizeof($arr); $n ++) {
//check if word is long enough, contains alphabetic characters and is not a common word
//to eliminate/count multiple instance of words
$next_in_arr = next($arr);
if ($next_in_arr != $element) {
if (strlen($element) >= $min_word_length && preg_match($pattern, remove_accents($element)) && (@ $common[$element] <> 1)) {
if (preg_match("/^(-|\\\')(.*)/", $element, $regs))
$element = $regs[2];

if (preg_match("/(.*)(\\\'|-)$/", $element, $regs))
$element = $regs[1];

$newarr[$i][1] = $element;
$newarr[$i][2] = $counter;
$element = current($arr);
$i ++;
$counter = 1;
} else {
$element = $next_in_arr;
}
} else {
if ($counter < $word_upper_bound)
$counter ++;
}

}
return $newarr;
}

/*
Checks if url is legal, relative to the main url.
*/
function url_purify($url, $parent_url, $can_leave_domain) {
global $ext, $mainurl, $apache_indexes, $strip_sessids;



$urlparts = parse_url($url);

$main_url_parts = parse_url($mainurl);
if ($urlparts['host'] != "" && $urlparts['host'] != $main_url_parts['host'] && $can_leave_domain != 1) {
return '';
}

reset($ext);
while (list ($id, $excl) = each($ext))
if (preg_match("/\.$excl$/i", $url))
return '';

if (substr($url, -1) == '\\') {
return '';
}



if (isset($urlparts['query'])) {
if ($apache_indexes[$urlparts['query']]) {
return '';
}
}

if (preg_match("/[\/]?mailto:|[\/]?javascript:|[\/]?news:/i", $url)) {
return '';
}
if (isset($urlparts['scheme'])) {
$scheme = $urlparts['scheme'];
} else {
$scheme ="";
}



//only http and https links are followed
if (!($scheme == 'http' || $scheme == '' || $scheme == 'https')) {
return '';
}

//parent url might be used to build an url from relative path
$parent_url = remove_file_from_url($parent_url);
$parent_url_parts = parse_url($parent_url);


if (substr($url, 0, 1) == '/') {
$url = $parent_url_parts['scheme']."://".$parent_url_parts['host'].$url;
} else
if (!isset($urlparts['scheme'])) {
$url = $parent_url.$url;
}

$url_parts = parse_url($url);

$urlpath = $url_parts['path'];

$regs = Array ();

while (preg_match("/[^\/]*\/[.]{2}\//", $urlpath, $regs)) {
$urlpath = str_replace($regs[0], "", $urlpath);
}

//remove relative path instructions like ../ etc
$urlpath = preg_replace("/\/+/", "/", $urlpath);
$urlpath = preg_replace("/[^\/]*\/[.]{2}/", "", $urlpath);
$urlpath = str_replace("./", "", $urlpath);
$query = "";
if (isset($url_parts['query'])) {
$query = "?".$url_parts['query'];
}
if ($main_url_parts['port'] == 80 || $url_parts['port'] == ""winking smiley {
$portq = "";
} else {
$portq = ":".$main_url_parts['port'];
}
$url = $url_parts['scheme']."://".$url_parts['host'].$portq.$urlpath.$query;

//if we index sub-domains
if ($can_leave_domain == 1) {
return $url;
}

$mainurl = remove_file_from_url($mainurl);

if ($strip_sessids == 1) {
$url = remove_sessid($url);
}
//only urls in staying in the starting domain/directory are followed
$url = convert_url($url);
if (strstr($url, $mainurl) == false) {
return '';
} else
return $url;
}

function save_keywords($wordarray, $link_id, $domain) {
global $mysql_table_prefix, $all_keywords;
reset($wordarray);
while ($thisword = each($wordarray)) {
$word = $thisword[1][1];
$wordmd5 = substr(md5($word), 0, 1);
$weight = $thisword[1][2];
if (strlen($word)<= 30) {
$keyword_id = $all_keywords[$word];
if ($keyword_id == ""winking smiley {
mysql_query("insert into ".$mysql_table_prefix."keywords (keyword) values ('$word')"winking smiley;
if (mysql_errno() == 1062) {
$result = mysql_query("select keyword_ID from ".$mysql_table_prefix."keywords where keyword='$word'"winking smiley;
echo mysql_error();
$row = mysql_fetch_row($result);
$keyword_id = $row[0];
} else{
$keyword_id = mysql_insert_id();
$all_keywords[$word] = $keyword_id;
echo mysql_error();
}
}
$inserts[$wordmd5] .= ",($link_id, $keyword_id, $weight, $domain)";
}
}

for ($i=0;$i<=15; $i++) {
$char = dechex($i);
$values= substr($inserts[$char], 1);
if ($values!=""winking smiley {
$query = "insert into ".$mysql_table_prefix."link_keyword$char (link_id, keyword_id, weight, domain) values $values";
mysql_query($query);
echo mysql_error();
}


}
}

function get_head_data($file) {
$headdata = "";

preg_match("@<head[^>]*>(.*?)<\/head>@si",$file, $regs);

$headdata = $regs[1];

$description = "";
$robots = "";
$keywords = "";
$base = "";
$res = Array ();
if ($headdata != ""winking smiley {
preg_match("/<meta +name *=[\"']?robots[\"']? *content=[\"']?([^<>'\"]+)[\"']?/i", $headdata, $res);
if (isset ($res)) {
$robots = $res[1];
}

preg_match("/<meta +name *=[\"']?description[\"']? *content=[\"']?([^<>'\"]+)[\"']?/i", $headdata, $res);
if (isset ($res)) {
$description = $res[1];
}

preg_match("/<meta +name *=[\"']?keywords[\"']? *content=[\"']?([^<>'\"]+)[\"']?/i", $headdata, $res);
if (isset ($res)) {
$keywords = $res[1];
}
// e.g. <base href="http://www.consil.co.uk/index.php" />
preg_match("/<base +href *= *[\"']?([^<>'\"]+)[\"']?/i", $headdata, $res);
if (isset ($res)) {
$base = $res[1];
}
$keywords = preg_replace("/[, ]+/", " ", $keywords);
$robots = explode(",", strtolower($robots));
$nofollow = 0;
$noindex = 0;
foreach ($robots as $x) {
if (trim($x) == "noindex"winking smiley {
$noindex = 1;
}
if (trim($x) == "nofollow"winking smiley {
$nofollow = 1;
}
}
$data['description'] = addslashes($description);
$data['keywords'] = addslashes($keywords);
$data['nofollow'] = $nofollow;
$data['noindex'] = $noindex;
$data['base'] = $base;
}
return $data;
}

function clean_file($file, $url, $type) {
global $entities, $index_host, $index_meta_keywords;

$urlparts = parse_url($url);
$host = $urlparts['host'];
//remove filename from path
$path = eregi_replace('([^/]+)$', "", $urlparts['path']);
$file = preg_replace("/<link rel[^<>]*>/i", " ", $file);
$file = preg_replace("@<!--sphider_noindex-->.*?<!--\/sphider_noindex-->@si", " ",$file);
$file = preg_replace("@<!--.*?-->@si", " ",$file);
$file = preg_replace("@<script[^>]*?>.*?</script>@si", " ",$file);
$headdata = get_head_data($file);
$regs = Array ();
if (preg_match("@<title *>(.*?)<\/title*>@si", $file, $regs)) {
$title = trim($regs[1]);
$file = str_replace($regs[0], "", $file);
} else if ($type == 'pdf' || $type == 'doc') { //the title of a non-html file is its first few words
$title = substr($file, 0, strrpos(substr($file, 0, 40), " "winking smiley);
}

$file = preg_replace("@<style[^>]*>.*?<\/style>@si", " ", $file);

//create spaces between tags, so that removing tags doesnt concatenate strings
$file = preg_replace("/<[\w ]+>/", "\\0 ", $file);
$file = preg_replace("/<\/[\w ]+>/", "\\0 ", $file);
$file = strip_tags($file);
$file = preg_replace("/&nbsp;/", " ", $file);

$fulltext = $file;
$file .= " ".$title;
if ($index_host == 1) {
$file = $file." ".$host." ".$path;
}
if ($index_meta_keywords == 1) {
$file = $file." ".$headdata['keywords'];
}


//replace codes with ascii chars
$file = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"winking smiley)', $file);
$file = preg_replace('~&#([0-9]+);~e', 'chr("\\1"winking smiley', $file);
$file = strtolower($file);
reset($entities);
while ($char = each($entities)) {
$file = preg_replace("/".$char[0]."/i", $char[1], $file);
}
$file = preg_replace("/&[a-z]{1,6};/", " ", $file);
$file = preg_replace("/[\*\^\+\?\\\.\[\]\^\$\|\{\)\(\}~!\"\/@#£$%&=`´;><:,]+/", " ", $file);
$file = preg_replace("/\s+/", " ", $file);
$data['fulltext'] = addslashes($fulltext);
$data['content'] = addslashes($file);
$data['title'] = addslashes($title);
$data['description'] = $headdata['description'];
$data['keywords'] = $headdata['keywords'];
$data['host'] = $host;
$data['path'] = $path;
$data['nofollow'] = $headdata['nofollow'];
$data['noindex'] = $headdata['noindex'];
$data['base'] = $headdata['base'];

return $data;

}

function calc_weights($wordarray, $title, $host, $path, $keywords) {
global $index_host, $index_meta_keywords;
$hostarray = unique_array(explode(" ", preg_replace("/[^[:alnum:]-]+/i", " ", strtolower($host))));
$patharray = unique_array(explode(" ", preg_replace("/[^[:alnum:]-]+/i", " ", strtolower($path))));
$titlearray = unique_array(explode(" ", preg_replace("/[^[:alnum:]-]+/i", " ", strtolower($title))));
$keywordsarray = unique_array(explode(" ", preg_replace("/[^[:alnum:]-]+/i", " ", strtolower($keywords))));
$path_depth = countSubstrs($path, "/"winking smiley;

while (list ($wid, $word) = each($wordarray)) {
$word_in_path = 0;
$word_in_domain = 0;
$word_in_title = 0;
$meta_keyword = 0;
if ($index_host == 1) {
while (list ($id, $path) = each($patharray)) {
if ($path[1] == $word[1]) {
$word_in_path = 1;
break;
}
}
reset($patharray);

while (list ($id, $host) = each($hostarray)) {
if ($host[1] == $word[1]) {
$word_in_domain = 1;
break;
}
}
reset($hostarray);
}

if ($index_meta_keywords == 1) {
while (list ($id, $keyword) = each($keywordsarray)) {
if ($keyword[1] == $word[1]) {
$meta_keyword = 1;
break;
}
}
reset($keywordsarray);
}
while (list ($id, $tit) = each($titlearray)) {
if ($tit[1] == $word[1]) {
$word_in_title = 1;
break;
}
}
reset($titlearray);

$wordarray[$wid][2] = (int) (calc_weight($wordarray[$wid][2], $word_in_title, $word_in_domain, $word_in_path, $path_depth, $meta_keyword));
}
reset($wordarray);
return $wordarray;
}

function isDuplicateMD5($md5sum) {
global $mysql_table_prefix,$table_prefix;
if(NULL==$mysql_table_prefix)$mysql_table_prefix = $table_prefix.'sph_';
$result = mysql_query("select link_id from ".$mysql_table_prefix."links where md5sum='$md5sum'"winking smiley;
echo mysql_error();
if (mysql_num_rows($result) > 0) {
return true;
}
return false;
}

function check_include($link, $inc, $not_inc) {
$url_inc = Array ();
$url_not_inc = Array ();
if ($inc != ""winking smiley {
$url_inc = explode("\n", $inc);
}
if ($not_inc != ""winking smiley {
$url_not_inc = explode("\n", $not_inc);
}
$oklinks = Array ();

$include = true;
foreach ($url_not_inc as $str) {
$str = trim($str);
if ($str != ""winking smiley {
if (substr($str, 0, 1) == '*') {
if (preg_match(substr($str, 1), $link)) {
$include = false;
break;
}
} else {
if (!(strpos($link, $str) === false)) {
$include = false;
break;
}
}
}
}
if ($include && $inc != ""winking smiley {
$include = false;
foreach ($url_inc as $str) {
$str = trim($str);
if ($str != ""winking smiley {
if (substr($str, 0, 1) == '*') {
if (preg_match(substr($str, 1), $link)) {
$include = true;
break 2;
}
} else {
if (strpos($link, $str) !== false) {
$include = true;
break;
}
}
}
}
}
return $include;
}

function check_for_removal($url) {
global $mysql_table_prefix,$table_prefix;
global $command_line;
if(NULL==$mysql_table_prefix)$mysql_table_prefix = $table_prefix.'sph_';
$result = mysql_query("select link_id, visible from ".$mysql_table_prefix."links where url='$url'"winking smiley;
echo mysql_error();
if (mysql_num_rows($result) > 0) {
$row = mysql_fetch_row($result);
$link_id = $row[0];
$visible = $row[1];
if ($visible > 0) {
$visible --;
mysql_query("update ".$mysql_table_prefix."links set visible=$visible where link_id=$link_id"winking smiley;
echo mysql_error();
} else {
mysql_query("delete from ".$mysql_table_prefix."links where link_id=$link_id"winking smiley;
echo mysql_error();
for ($i=0;$i<=15; $i++) {
$char = dechex($i);
mysql_query("delete from ".$mysql_table_prefix."link_keyword$char where link_id=$link_id"winking smiley;
echo mysql_error();
}
printStandardReport('pageRemoved',$command_line);
}
}
}

function convert_url($url) {
$url = str_replace("&amp;", "&", $url);
$url = str_replace(" ", "%20", $url);
return $url;
}

function extract_text($contents, $source_type) {
global $tmp_dir, $pdftotext_path, $catdoc_path, $xls2csv_path, $catppt_path;

$temp_file = "tmp_file";
$filename = $tmp_dir."/".$temp_file ;
if (!$handle = fopen($filename, 'w')) {
die ("Cannot open file $filename"winking smiley;
}

if (fwrite($handle, $contents) === FALSE) {
die ("Cannot write to file $filename"winking smiley;
}

fclose($handle);
if ($source_type == 'pdf') {
$command = $pdftotext_path." $filename -";
$a = exec($command,$result, $retval);
} else if ($source_type == 'doc') {
$command = $catdoc_path." $filename";
$a = exec($command,$result, $retval);
} else if ($source_type == 'xls') {
$command = $xls2csv_path." $filename";
$a = exec($command,$result, $retval);
} else if ($source_type == 'ppt') {
$command = $catppt_path." $filename";
$a = exec($command,$result, $retval);
}

unlink ($filename);
return implode(' ', $result);

}

//function to calculate the weight of pages
function calc_weight ($words_in_page, $word_in_title, $word_in_domain, $word_in_path, $path_depth, $meta_keyword) {
global $title_weight, $domain_weight, $path_weight,$meta_weight;
$weight = ($words_in_page + $word_in_title * $title_weight +
$word_in_domain * $domain_weight +
$word_in_path * $path_weight + $meta_keyword * $meta_weight) *10 / (0.8 +0.2*$path_depth);

return $weight;
}

function remove_sessid($url) {
return preg_replace("/(\?|&winking smiley(PHPSESSID|JSESSIONID|ASPSESSIONID|sid)=[0-9a-zA-Z]+$/", "", $url);
}
?>
Re: Sphider PHP5.3.0 and ereg function
November 04, 2009 12:59AM
and for INCLUDE/searchfuncs.php replace for this: (BE CAREFULLY!!! Some characters are changed by emoticos!!!)

<?php
/*******************************************
* Sphider Version 1.3.x
* This program is licensed under the GNU GPL.
* By Ando Saabas ando(a t)cs.ioc.ee
********************************************/

error_reporting(E_ALL ^ E_NOTICE);

function swap_max (&$arr, $start, $domain) {
$pos = $start;
$maxweight = $arr[$pos]['weight'];
for ($i = $start; $i< count($arr); $i++) {
if ($arr[$i]['domain'] == $domain) {
$pos = $i;
$maxweight = $arr[$i]['weight'];
break;
}
if ($arr[$i]['weight'] > $maxweight) {
$pos = $i;
$maxweight = $arr[$i]['weight'];
}
}
$temp = $arr[$start];
$arr[$start] = $arr[$pos];
$arr[$pos] = $temp;
}

function sort_with_domains (&$arr) {
$domain = -1;
for ($i = 0; $i< count($arr)-1; $i++) {
swap_max($arr, $i, $domain);
$domain = $arr[$i]['domain'];
}
}

function cmp($a, $b) {
if ($a['weight'] == $b['weight'])
return 0;

return ($a['weight'] > $b['weight']) ? -1 : 1;
}

function addmarks($a) {
$a = eregi_replace("[ ]+", " ", $a);
$a = str_replace(" +", "+", $a);
$a = str_replace(" ", "+", $a);
return $a;
}

function makeboollist($a) {
global $entities, $stem_words;
while ($char = each($entities)) {
$a = preg_replace("/".$char[0]."/", $char[1], $a);
}
$a = trim($a);

$a = preg_replace("/"."&quot;"."/", "\"", $a);
$returnWords = array();
//get all phrases
$regs = Array();
while (preg_match("/([-]?)\"([^\"]+)\"/i", $a, $regs)) {
if ($regs[1] == '') {
$returnWords['+s'][] = $regs[2];
$returnWords['hilight'][] = $regs[2];
} else {
$returnWords['-s'][] = $regs[2];
}
$a = str_replace($regs[0], "", $a);
}
$a = strtolower(preg_replace("/"."[ ]+"."/", " ", $a));
// $a = remove_accents($a);
$a = trim($a);
$words = explode(' ', $a);
if ($a==""winking smiley {
$limit = 0;
} else {
$limit = count($words);
}


$k = 0;
//get all words (both include and exlude)
$includeWords = array();
while ($k < $limit) {
if (substr($words[$k], 0, 1) == '+') {
$includeWords[] = substr($words[$k], 1);
if (!ignoreWord(substr($words[$k], 1))) {
$returnWords['hilight'][] = substr($words[$k], 1);
if ($stem_words == 1) {
$returnWords['hilight'][] = stem(substr($words[$k], 1));
}
}
} else if (substr($words[$k], 0, 1) == '-') {
$returnWords['-'][] = substr($words[$k], 1);
} else {
$includeWords[] = $words[$k];
if (!ignoreWord($words[$k])) {
$returnWords['hilight'][] = $words[$k];
if ($stem_words == 1) {
$returnWords['hilight'][] = stem($words[$k]);
}
}
}
$k++;
}
//add words from phrases to includes
if (isset($returnWords['+s'])) {
foreach ($returnWords['+s'] as $phrase) {
$phrase = strtolower(eregi_replace("[ ]+", " ", $phrase));
$phrase = trim($phrase);
$temparr = explode(' ', $phrase);
foreach ($temparr as $w)
$includeWords[] = $w;
}
}

foreach ($includeWords as $word) {
if (!($word =='')) {
if (ignoreWord($word)) {

$returnWords['ignore'][] = $word;
} else {
$returnWords['+'][] = $word;
}
}

}
return $returnWords;

}

function ignoreword($word) {
global $common;
global $min_word_length;
global $index_numbers;
if ($index_numbers == 1) {
$pattern = "[a-z0-9]+";
} else {
$pattern = "[a-z]+";
}
if (strlen($word) < $min_word_length || (!preg_match("/".$pattern."/", remove_accents($word))) || ($common[$word] == 1)) {
return 1;
} else {
return 0;
}
}

function search($searchstr, $category, $start, $per_page, $type, $domain) {
global $length_of_link_desc,$mysql_table_prefix, $show_meta_description, $merge_site_results, $stem_words, $did_you_mean_enabled ;

$possible_to_find = 1;
$result = mysql_query("select domain_id from ".$mysql_table_prefix."domains where domain = '$domain'"winking smiley;
if (mysql_num_rows($result)> 0) {
$thisrow = mysql_fetch_array($result);
$domain_qry = "and domain = ".$thisrow[0];
} else {
$domain_qry = "";
}

//find all sites that should not be included in the result
if (count($searchstr['+']) == 0) {
return null;
}
$wordarray = $searchstr['-'];
$notlist = array();
$not_words = 0;
while ($not_words < count($wordarray)) {
if ($stem_words == 1) {
$searchword = addslashes(stem($wordarray[$not_words]));
} else {
$searchword = addslashes($wordarray[$not_words]);
}
$wordmd5 = substr(md5($searchword), 0, 1);

$query1 = "SELECT link_id from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword'";

$result = mysql_query($query1);

while ($row = mysql_fetch_row($result)) {
$notlist[$not_words]['id'][$row[0]] = 1;
}
$not_words++;
}


//find all sites containing the search phrase
$wordarray = $searchstr['+s'];
$phrase_words = 0;
while ($phrase_words < count($wordarray)) {

$searchword = addslashes($wordarray[$phrase_words]);
$query1 = "SELECT link_id from ".$mysql_table_prefix."links where fulltxt like '% $searchword%'";
echo mysql_error();
$result = mysql_query($query1);
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
$possible_to_find = 0;
break;
}
while ($row = mysql_fetch_row($result)) {
$phraselist[$phrase_words]['id'][$row[0]] = 1;
}
$phrase_words++;
}


if (($category> 0) && $possible_to_find==1) {
$allcats = get_cats($category);
$catlist = implode(",", $allcats);
$query1 = "select link_id from ".$mysql_table_prefix."links, ".$mysql_table_prefix."sites, ".$mysql_table_prefix."categories, ".$mysql_table_prefix."site_category where ".$mysql_table_prefix."links.site_id = ".$mysql_table_prefix."sites.site_id and ".$mysql_table_prefix."sites.site_id = ".$mysql_table_prefix."site_category.site_id and ".$mysql_table_prefix."site_category.category_id in ($catlist)";
$result = mysql_query($query1);
echo mysql_error();
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
$possible_to_find = 0;
}
while ($row = mysql_fetch_row($result)) {
$category_list[$row[0]] = 1;
}
}


//find all sites that include the search word
$wordarray = $searchstr['+'];
$words = 0;
$starttime = getmicrotime();
while (($words < count($wordarray)) && $possible_to_find == 1) {
if ($stem_words == 1) {
$searchword = addslashes(stem($wordarray[$words]));
} else {
$searchword = addslashes($wordarray[$words]);
}
$wordmd5 = substr(md5($searchword), 0, 1);
$query1 = "SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword' $domain_qry order by weight desc";
echo mysql_error();
$result = mysql_query($query1);
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
if ($type != "or"winking smiley {
$possible_to_find = 0;
break;
}
}
if ($type == "or"winking smiley {
$indx = 0;
} else {
$indx = $words;
}

while ($row = mysql_fetch_row($result)) {
$linklist[$indx]['id'][] = $row[0];
$domains[$row[0]] = $row[2];
$linklist[$indx]['weight'][$row[0]] = $row[1];
}
$words++;
}


if ($type == "or"winking smiley {
$words = 1;
}
$result_array_full = Array();

if ($possible_to_find !=0) {
if ($words == 1 && $not_words == 0 && $category < 1) { //if there is only one search word, we already have the result
$result_array_full = $linklist[0]['weight'];
} else { //otherwise build an intersection of all the results
$j= 1;
$min = 0;
while ($j < $words) {
if (count($linklist[$min]['id']) > count($linklist[$j]['id'])) {
$min = $j;
}
$j++;
}

$j = 0;


$temp_array = $linklist[$min]['id'];
$count = 0;
while ($j < count($temp_array)) {
$k = 0; //and word counter
$n = 0; //not word counter
$o = 0; //phrase word counter
$weight = 1;
$break = 0;
while ($k < $words && $break== 0) {
if ($linklist[$k]['weight'][$temp_array[$j]] > 0) {
$weight = $weight + $linklist[$k]['weight'][$temp_array[$j]];
} else {
$break = 1;
}
$k++;
}
while ($n < $not_words && $break== 0) {
if ($notlist[$n]['id'][$temp_array[$j]] > 0) {
$break = 1;
}
$n++;
}

while ($o < $phrase_words && $break== 0) {
if ($phraselist[$n]['id'][$temp_array[$j]] != 1) {
$break = 1;
}
$o++;
}
if ($break== 0 && $category > 0 && $category_list[$temp_array[$j]] != 1) {
$break = 1;
}

if ($break == 0) {
$result_array_full[$temp_array[$j]] = $weight;
$count ++;
}
$j++;
}
}
}
$end = getmicrotime()- $starttime;


if ((count($result_array_full) == 0 || $possible_to_find == 0) && $did_you_mean_enabled == 1) {
reset ($searchstr['+']);
foreach ($searchstr['+'] as $word) {
$word = addslashes($word);
$result = mysql_query("select keyword from ".$mysql_table_prefix."keywords where soundex(keyword) = soundex('$word')"winking smiley;
$max_distance = 100;
$near_word ="";
while ($row=mysql_fetch_row($result)) {

$distance = levenshtein($row[0], $word);
if ($distance < $max_distance && $distance <4) {
$max_distance = $distance;
$near_word = $row[0];
}
}

if ($near_word != "" && $word != $near_word) {
$near_words[$word] = $near_word;
}

}
$res['did_you_mean'] = $near_words;
return $res;
}
if (count($result_array_full) == 0) {
return null;
}
arsort ($result_array_full);


if ($merge_site_results == 1 && $domain_qry == ""winking smiley {
while (list($key, $value) = each($result_array_full)) {
if (!isset($domains_to_show[$domains[$key]])) {
$result_array_temp[$key] = $value;
$domains_to_show[$domains[$key]] = 1;
} else if ($domains_to_show[$domains[$key]] == 1) {
$domains_to_show[$domains[$key]] = Array ($key => $value);
}
}
} else {
$result_array_temp = $result_array_full;
}


while (list($key, $value) = each ($result_array_temp)) {
$result_array[$key] = $value;
if (isset ($domains_to_show[$domains[$key]]) && $domains_to_show[$domains[$key]] != 1) {
list ($k, $v) = each($domains_to_show[$domains[$key]]);
$result_array[$k] = $v;
}
}

$results = count($result_array);

$keys = array_keys($result_array);
$maxweight = $result_array[$keys[0]];


for ($i = ($start -1)*$per_page; $i <min($results, ($start -1)*$per_page + $per_page) ; $i++) {
$in[] = $keys[$i];

}
if (!is_array($in)) {
$res['results'] = $results;
return $res;
}

$inlist = implode(",", $in);


if ($length_of_link_desc == 0) {
$fulltxt = "fulltxt";
} else {
$fulltxt = "substring(fulltxt, 1, $length_of_link_desc)";
}

$query1 = "SELECT distinct link_id, url, title, description, $fulltxt, size FROM ".$mysql_table_prefix."links WHERE link_id in ($inlist)";

$result = mysql_query($query1);
echo mysql_error();

$i = 0;
while ($row = mysql_fetch_row($result)) {
$res[$i]['title'] = $row[2];
$res[$i]['url'] = $row[1];
if ($row[3] != null && $show_meta_description == 1)
$res[$i]['fulltxt'] = $row[3];
else
$res[$i]['fulltxt'] = $row[4];
$res[$i]['size'] = $row[5];
$res[$i]['weight'] = $result_array[$row[0]];
$dom_result = mysql_query("select domain from ".$mysql_table_prefix."domains where domain_id='".$domains[$row[0]]."'"winking smiley;
$dom_row = mysql_fetch_row($dom_result);
$res[$i]['domain'] = $dom_row[0];
$i++;
}



if ($merge_site_results && $domain_qry == ""winking smiley {
sort_with_domains($res);
} else {
usort($res, "cmp"winking smiley;
}
echo mysql_error();
$res['maxweight'] = $maxweight;
$res['results'] = $results;
return $res;
/**/
}

function get_search_results($query, $start, $category, $searchtype, $results, $domain) {
global $sph_messages, $results_per_page,
$links_to_next,
$show_query_scores,
$mysql_table_prefix,
$desc_length;
if ($results != ""winking smiley {
$results_per_page = $results;
}

if ($searchtype == "phrase"winking smiley {
$query=str_replace('"','',$query);
$query = "\"".$query."\"";
}

$starttime = getmicrotime();
// catch " if only one time entered
if (substr_count($query,'"')==1){
$query=str_replace('"','',$query);
}
$words = makeboollist($query);
$ignorewords = $words['ignore'];


$full_result['ignore_words'] = $words['ignore'];

if ($start==0)
$start=1;
$result = search($words, $category, $start, $results_per_page, $searchtype, $domain);
$query= stripslashes($query);

$entitiesQuery = htmlspecialchars($query);
$full_result['ent_query'] = $entitiesQuery;

$endtime = getmicrotime() - $starttime;
$rows = $result['results'];
$time = round($endtime*100)/100;


$full_result['time'] = $time;

$did_you_mean = "";


if (isset($result['did_you_mean'])) {
$did_you_mean_b=$entitiesQuery;
$did_you_mean=$entitiesQuery;
while (list($key, $val) = each($result['did_you_mean'])) {
if ($key != $val) {
$did_you_mean_b = str_replace($key, "<b>$val</b>", $did_you_mean_b);
$did_you_mean = str_replace($key, "$val", $did_you_mean);
}
}
}

$full_result['did_you_mean'] = $did_you_mean;
$full_result['did_you_mean_b'] = $did_you_mean_b;

$matchword = $sph_messages["matches"];
if ($rows == 1) {
$matchword= $sph_messages["match"];
}

$num_of_results = count($result) - 2;



$full_result['num_of_results'] = $num_of_results;


if ($start < 2)
saveToLog(addslashes($query), $time, $rows);
$from = ($start-1) * $results_per_page+1;
$to = min(($start)*$results_per_page, $rows);


$full_result['from'] = $from;
$full_result['to'] = $to;
$full_result['total_results'] = $rows;

if ($rows>0) {
$maxweight = $result['maxweight'];
$i = 0;
while ($i < $num_of_results && $i < $results_per_page) {
$title = $result[$i]['title'];
$url = $result[$i]['url'];
$fulltxt = $result[$i]['fulltxt'];
$page_size = $result[$i]['size'];
$domain = $result[$i]['domain'];
if ($page_size!=""winking smiley
$page_size = number_format($page_size, 1)."kb";


$txtlen = strlen($fulltxt);
if ($txtlen > $desc_length) {
$places = array();
foreach($words['hilight'] as $word) {
$tmp = strtolower($fulltxt);
$found_in = strpos($tmp, $word);
$sum = -strlen($word);
while (!($found_in =='')) {
$pos = $found_in+strlen($word);
$sum += $pos; //FIX!!
$tmp = substr($tmp, $pos);
$places[] = $sum;
$found_in = strpos($tmp, $word);

}
}
sort($places);
$x = 0;
$begin = 0;
$end = 0;
while(list($id, $place) = each($places)) {
while ($places[$id + $x] - $place < $desc_length && $x+$id < count($places) && $place < strlen($fulltxt) -$desc_length) {
$x++;
$begin = $id;
$end = $id + $x;
}
}

$begin_pos = max(0, $places[$begin] - 30);
$fulltxt = substr($fulltxt, $begin_pos, $desc_length);

if ($places[$begin] > 0) {
$begin_pos = strpos($fulltxt, " "winking smiley;
}
$fulltxt = substr($fulltxt, $begin_pos, $desc_length);
$fulltxt = substr($fulltxt, 0, strrpos($fulltxt, " "winking smiley);
$fulltxt = $fulltxt;
}

$weight = number_format($result[$i]['weight']/$maxweight*100, 2);
if ($title=='')
$title = $sph_messages["Untitled"];
$regs = Array();

if (strlen($title) > 80) {
$title = substr($title, 0,76)."...";
}
foreach($words['hilight'] as $change) {
while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$title." ", $regs)) {
$title = eregi_replace($regs[1], "<b>".$regs[1]."</b>", $title);
}

while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/", "<b>".$regs[1]."</b>", $fulltxt);
}
$url2 = $url;
while (@eregi("[^\>](".$change."winking smiley[^\<]", $url2, $regs)) {
$url2 = eregi_replace($regs[1], "<b>".$regs[1]."</b>", $url2);
}
}


$num = $from + $i;

$full_result['qry_results'][$i]['num'] = $num;
$full_result['qry_results'][$i]['weight'] = $weight;
$full_result['qry_results'][$i]['url'] = $url;
$full_result['qry_results'][$i]['title'] = $title;
$full_result['qry_results'][$i]['fulltxt'] = $fulltxt;
$full_result['qry_results'][$i]['url2'] = $url2;
$full_result['qry_results'][$i]['page_size'] = $page_size;
$full_result['qry_results'][$i]['domain_name'] = $domain;
$i++;
}
}



$pages = ceil($rows / $results_per_page);
$full_result['pages'] = $pages;
$prev = $start - 1;
$full_result['prev'] = $prev;
$next = $start + 1;
$full_result['next'] = $next;
$full_result['start'] = $start;
$full_result['query'] = $entitiesQuery;

if ($from <= $to) {

$firstpage = $start - $links_to_next;
if ($firstpage < 1) $firstpage = 1;
$lastpage = $start + $links_to_next;
if ($lastpage > $pages) $lastpage = $pages;

for ($x=$firstpage; $x<=$lastpage; $x++)
$full_result['other_pages'][] = $x;

}

return $full_result;

}


?>
Re: Sphider PHP5.3.0 and ereg function
November 04, 2009 01:01AM
Replace the smile smiling smiley with this ), not : )
Re: Sphider PHP5.3.0 and ereg function
November 04, 2009 02:50AM
here is a NEW VERSION of searchfuncs.php in INCLUDES

<?php
/*******************************************
* Sphider Version 1.3.x
* This program is licensed under the GNU GPL.
* By Ando Saabas ando(a t)cs.ioc.ee
********************************************/

error_reporting(E_ALL ^ E_NOTICE);

function swap_max (&$arr, $start, $domain) {
$pos = $start;
$maxweight = $arr[$pos]['weight'];
for ($i = $start; $i< count($arr); $i++) {
if ($arr[$i]['domain'] == $domain) {
$pos = $i;
$maxweight = $arr[$i]['weight'];
break;
}
if ($arr[$i]['weight'] > $maxweight) {
$pos = $i;
$maxweight = $arr[$i]['weight'];
}
}
$temp = $arr[$start];
$arr[$start] = $arr[$pos];
$arr[$pos] = $temp;
}

function sort_with_domains (&$arr) {
$domain = -1;
for ($i = 0; $i< count($arr)-1; $i++) {
swap_max($arr, $i, $domain);
$domain = $arr[$i]['domain'];
}
}

function cmp($a, $b) {
if ($a['weight'] == $b['weight'])
return 0;

return ($a['weight'] > $b['weight']) ? -1 : 1;
}

function addmarks($a) {
$a = preg_replace("/"."[ ]+"."/", " ", $a);
$a = str_replace(" +", "+", $a);
$a = str_replace(" ", "+", $a);
return $a;
}

function makeboollist($a) {
global $entities, $stem_words;
while ($char = each($entities)) {
$a = preg_replace("/".$char[0]."/", $char[1], $a);
}
$a = trim($a);

$a = preg_replace("/"."&quot;"."/", "\"", $a);
$returnWords = array();
//get all phrases
$regs = Array();
while (preg_match("/([-]?)\"([^\"]+)\"/i", $a, $regs)) {
if ($regs[1] == '') {
$returnWords['+s'][] = $regs[2];
$returnWords['hilight'][] = $regs[2];
} else {
$returnWords['-s'][] = $regs[2];
}
$a = str_replace($regs[0], "", $a);
}
$a = strtolower(preg_replace("/"."[ ]+"."/", " ", $a));
// $a = remove_accents($a);
$a = trim($a);
$words = explode(' ', $a);
if ($a==""winking smiley {
$limit = 0;
} else {
$limit = count($words);
}


$k = 0;
//get all words (both include and exlude)
$includeWords = array();
while ($k < $limit) {
if (substr($words[$k], 0, 1) == '+') {
$includeWords[] = substr($words[$k], 1);
if (!ignoreWord(substr($words[$k], 1))) {
$returnWords['hilight'][] = substr($words[$k], 1);
if ($stem_words == 1) {
$returnWords['hilight'][] = stem(substr($words[$k], 1));
}
}
} else if (substr($words[$k], 0, 1) == '-') {
$returnWords['-'][] = substr($words[$k], 1);
} else {
$includeWords[] = $words[$k];
if (!ignoreWord($words[$k])) {
$returnWords['hilight'][] = $words[$k];
if ($stem_words == 1) {
$returnWords['hilight'][] = stem($words[$k]);
}
}
}
$k++;
}
//add words from phrases to includes
if (isset($returnWords['+s'])) {
foreach ($returnWords['+s'] as $phrase) {
$phrase = strtolower(preg_replace("/"."[ ]+"."/", " ", $phrase));
$phrase = trim($phrase);
$temparr = explode(' ', $phrase);
foreach ($temparr as $w)
$includeWords[] = $w;
}
}

foreach ($includeWords as $word) {
if (!($word =='')) {
if (ignoreWord($word)) {

$returnWords['ignore'][] = $word;
} else {
$returnWords['+'][] = $word;
}
}

}
return $returnWords;

}

function ignoreword($word) {
global $common;
global $min_word_length;
global $index_numbers;
if ($index_numbers == 1) {
$pattern = "[a-z0-9]+";
} else {
$pattern = "[a-z]+";
}
if (strlen($word) < $min_word_length || (!preg_match("/".$pattern."/", remove_accents($word))) || ($common[$word] == 1)) {
return 1;
} else {
return 0;
}
}

function search($searchstr, $category, $start, $per_page, $type, $domain) {
global $length_of_link_desc,$mysql_table_prefix, $show_meta_description, $merge_site_results, $stem_words, $did_you_mean_enabled ;

$possible_to_find = 1;
$result = mysql_query("select domain_id from ".$mysql_table_prefix."domains where domain = '$domain'"winking smiley;
if (mysql_num_rows($result)> 0) {
$thisrow = mysql_fetch_array($result);
$domain_qry = "and domain = ".$thisrow[0];
} else {
$domain_qry = "";
}

//find all sites that should not be included in the result
if (count($searchstr['+']) == 0) {
return null;
}
$wordarray = $searchstr['-'];
$notlist = array();
$not_words = 0;
while ($not_words < count($wordarray)) {
if ($stem_words == 1) {
$searchword = addslashes(stem($wordarray[$not_words]));
} else {
$searchword = addslashes($wordarray[$not_words]);
}
$wordmd5 = substr(md5($searchword), 0, 1);

$query1 = "SELECT link_id from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword'";

$result = mysql_query($query1);

while ($row = mysql_fetch_row($result)) {
$notlist[$not_words]['id'][$row[0]] = 1;
}
$not_words++;
}


//find all sites containing the search phrase
$wordarray = $searchstr['+s'];
$phrase_words = 0;
while ($phrase_words < count($wordarray)) {

$searchword = addslashes($wordarray[$phrase_words]);
$query1 = "SELECT link_id from ".$mysql_table_prefix."links where fulltxt like '% $searchword%'";
echo mysql_error();
$result = mysql_query($query1);
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
$possible_to_find = 0;
break;
}
while ($row = mysql_fetch_row($result)) {
$phraselist[$phrase_words]['id'][$row[0]] = 1;
}
$phrase_words++;
}


if (($category> 0) && $possible_to_find==1) {
$allcats = get_cats($category);
$catlist = implode(",", $allcats);
$query1 = "select link_id from ".$mysql_table_prefix."links, ".$mysql_table_prefix."sites, ".$mysql_table_prefix."categories, ".$mysql_table_prefix."site_category where ".$mysql_table_prefix."links.site_id = ".$mysql_table_prefix."sites.site_id and ".$mysql_table_prefix."sites.site_id = ".$mysql_table_prefix."site_category.site_id and ".$mysql_table_prefix."site_category.category_id in ($catlist)";
$result = mysql_query($query1);
echo mysql_error();
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
$possible_to_find = 0;
}
while ($row = mysql_fetch_row($result)) {
$category_list[$row[0]] = 1;
}
}


//find all sites that include the search word
$wordarray = $searchstr['+'];
$words = 0;
$starttime = getmicrotime();
while (($words < count($wordarray)) && $possible_to_find == 1) {
if ($stem_words == 1) {
$searchword = addslashes(stem($wordarray[$words]));
} else {
$searchword = addslashes($wordarray[$words]);
}
$wordmd5 = substr(md5($searchword), 0, 1);
$query1 = "SELECT distinct link_id, weight, domain from ".$mysql_table_prefix."link_keyword$wordmd5, ".$mysql_table_prefix."keywords where ".$mysql_table_prefix."link_keyword$wordmd5.keyword_id= ".$mysql_table_prefix."keywords.keyword_id and keyword='$searchword' $domain_qry order by weight desc";
echo mysql_error();
$result = mysql_query($query1);
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
if ($type != "or"winking smiley {
$possible_to_find = 0;
break;
}
}
if ($type == "or"winking smiley {
$indx = 0;
} else {
$indx = $words;
}

while ($row = mysql_fetch_row($result)) {
$linklist[$indx]['id'][] = $row[0];
$domains[$row[0]] = $row[2];
$linklist[$indx]['weight'][$row[0]] = $row[1];
}
$words++;
}


if ($type == "or"winking smiley {
$words = 1;
}
$result_array_full = Array();

if ($possible_to_find !=0) {
if ($words == 1 && $not_words == 0 && $category < 1) { //if there is only one search word, we already have the result
$result_array_full = $linklist[0]['weight'];
} else { //otherwise build an intersection of all the results
$j= 1;
$min = 0;
while ($j < $words) {
if (count($linklist[$min]['id']) > count($linklist[$j]['id'])) {
$min = $j;
}
$j++;
}

$j = 0;


$temp_array = $linklist[$min]['id'];
$count = 0;
while ($j < count($temp_array)) {
$k = 0; //and word counter
$n = 0; //not word counter
$o = 0; //phrase word counter
$weight = 1;
$break = 0;
while ($k < $words && $break== 0) {
if ($linklist[$k]['weight'][$temp_array[$j]] > 0) {
$weight = $weight + $linklist[$k]['weight'][$temp_array[$j]];
} else {
$break = 1;
}
$k++;
}
while ($n < $not_words && $break== 0) {
if ($notlist[$n]['id'][$temp_array[$j]] > 0) {
$break = 1;
}
$n++;
}

while ($o < $phrase_words && $break== 0) {
if ($phraselist[$n]['id'][$temp_array[$j]] != 1) {
$break = 1;
}
$o++;
}
if ($break== 0 && $category > 0 && $category_list[$temp_array[$j]] != 1) {
$break = 1;
}

if ($break == 0) {
$result_array_full[$temp_array[$j]] = $weight;
$count ++;
}
$j++;
}
}
}
$end = getmicrotime()- $starttime;


if ((count($result_array_full) == 0 || $possible_to_find == 0) && $did_you_mean_enabled == 1) {
reset ($searchstr['+']);
foreach ($searchstr['+'] as $word) {
$word = addslashes($word);
$result = mysql_query("select keyword from ".$mysql_table_prefix."keywords where soundex(keyword) = soundex('$word')"winking smiley;
$max_distance = 100;
$near_word ="";
while ($row=mysql_fetch_row($result)) {

$distance = levenshtein($row[0], $word);
if ($distance < $max_distance && $distance <4) {
$max_distance = $distance;
$near_word = $row[0];
}
}

if ($near_word != "" && $word != $near_word) {
$near_words[$word] = $near_word;
}

}
$res['did_you_mean'] = $near_words;
return $res;
}
if (count($result_array_full) == 0) {
return null;
}
arsort ($result_array_full);


if ($merge_site_results == 1 && $domain_qry == ""winking smiley {
while (list($key, $value) = each($result_array_full)) {
if (!isset($domains_to_show[$domains[$key]])) {
$result_array_temp[$key] = $value;
$domains_to_show[$domains[$key]] = 1;
} else if ($domains_to_show[$domains[$key]] == 1) {
$domains_to_show[$domains[$key]] = Array ($key => $value);
}
}
} else {
$result_array_temp = $result_array_full;
}


while (list($key, $value) = each ($result_array_temp)) {
$result_array[$key] = $value;
if (isset ($domains_to_show[$domains[$key]]) && $domains_to_show[$domains[$key]] != 1) {
list ($k, $v) = each($domains_to_show[$domains[$key]]);
$result_array[$k] = $v;
}
}

$results = count($result_array);

$keys = array_keys($result_array);
$maxweight = $result_array[$keys[0]];


for ($i = ($start -1)*$per_page; $i <min($results, ($start -1)*$per_page + $per_page) ; $i++) {
$in[] = $keys[$i];

}
if (!is_array($in)) {
$res['results'] = $results;
return $res;
}

$inlist = implode(",", $in);


if ($length_of_link_desc == 0) {
$fulltxt = "fulltxt";
} else {
$fulltxt = "substring(fulltxt, 1, $length_of_link_desc)";
}

$query1 = "SELECT distinct link_id, url, title, description, $fulltxt, size FROM ".$mysql_table_prefix."links WHERE link_id in ($inlist)";

$result = mysql_query($query1);
echo mysql_error();

$i = 0;
while ($row = mysql_fetch_row($result)) {
$res[$i]['title'] = $row[2];
$res[$i]['url'] = $row[1];
if ($row[3] != null && $show_meta_description == 1)
$res[$i]['fulltxt'] = $row[3];
else
$res[$i]['fulltxt'] = $row[4];
$res[$i]['size'] = $row[5];
$res[$i]['weight'] = $result_array[$row[0]];
$dom_result = mysql_query("select domain from ".$mysql_table_prefix."domains where domain_id='".$domains[$row[0]]."'"winking smiley;
$dom_row = mysql_fetch_row($dom_result);
$res[$i]['domain'] = $dom_row[0];
$i++;
}



if ($merge_site_results && $domain_qry == ""winking smiley {
sort_with_domains($res);
} else {
usort($res, "cmp"winking smiley;
}
echo mysql_error();
$res['maxweight'] = $maxweight;
$res['results'] = $results;
return $res;
/**/
}

function get_search_results($query, $start, $category, $searchtype, $results, $domain) {
global $sph_messages, $results_per_page,
$links_to_next,
$show_query_scores,
$mysql_table_prefix,
$desc_length;
if ($results != ""winking smiley {
$results_per_page = $results;
}

if ($searchtype == "phrase"winking smiley {
$query=str_replace('"','',$query);
$query = "\"".$query."\"";
}

$starttime = getmicrotime();
// catch " if only one time entered
if (substr_count($query,'"')==1){
$query=str_replace('"','',$query);
}
$words = makeboollist($query);
$ignorewords = $words['ignore'];


$full_result['ignore_words'] = $words['ignore'];

if ($start==0)
$start=1;
$result = search($words, $category, $start, $results_per_page, $searchtype, $domain);
$query= stripslashes($query);

$entitiesQuery = htmlspecialchars($query);
$full_result['ent_query'] = $entitiesQuery;

$endtime = getmicrotime() - $starttime;
$rows = $result['results'];
$time = round($endtime*100)/100;


$full_result['time'] = $time;

$did_you_mean = "";


if (isset($result['did_you_mean'])) {
$did_you_mean_b=$entitiesQuery;
$did_you_mean=$entitiesQuery;
while (list($key, $val) = each($result['did_you_mean'])) {
if ($key != $val) {
$did_you_mean_b = str_replace($key, "<b>$val</b>", $did_you_mean_b);
$did_you_mean = str_replace($key, "$val", $did_you_mean);
}
}
}

$full_result['did_you_mean'] = $did_you_mean;
$full_result['did_you_mean_b'] = $did_you_mean_b;

$matchword = $sph_messages["matches"];
if ($rows == 1) {
$matchword= $sph_messages["match"];
}

$num_of_results = count($result) - 2;



$full_result['num_of_results'] = $num_of_results;


if ($start < 2)
saveToLog(addslashes($query), $time, $rows);
$from = ($start-1) * $results_per_page+1;
$to = min(($start)*$results_per_page, $rows);


$full_result['from'] = $from;
$full_result['to'] = $to;
$full_result['total_results'] = $rows;

if ($rows>0) {
$maxweight = $result['maxweight'];
$i = 0;
while ($i < $num_of_results && $i < $results_per_page) {
$title = $result[$i]['title'];
$url = $result[$i]['url'];
$fulltxt = $result[$i]['fulltxt'];
$page_size = $result[$i]['size'];
$domain = $result[$i]['domain'];
if ($page_size!=""winking smiley
$page_size = number_format($page_size, 1)."kb";


$txtlen = strlen($fulltxt);
if ($txtlen > $desc_length) {
$places = array();
foreach($words['hilight'] as $word) {
$tmp = strtolower($fulltxt);
$found_in = strpos($tmp, $word);
$sum = -strlen($word);
while (!($found_in =='')) {
$pos = $found_in+strlen($word);
$sum += $pos; //FIX!!
$tmp = substr($tmp, $pos);
$places[] = $sum;
$found_in = strpos($tmp, $word);

}
}
sort($places);
$x = 0;
$begin = 0;
$end = 0;
while(list($id, $place) = each($places)) {
while ($places[$id + $x] - $place < $desc_length && $x+$id < count($places) && $place < strlen($fulltxt) -$desc_length) {
$x++;
$begin = $id;
$end = $id + $x;
}
}

$begin_pos = max(0, $places[$begin] - 30);
$fulltxt = substr($fulltxt, $begin_pos, $desc_length);

if ($places[$begin] > 0) {
$begin_pos = strpos($fulltxt, " "winking smiley;
}
$fulltxt = substr($fulltxt, $begin_pos, $desc_length);
$fulltxt = substr($fulltxt, 0, strrpos($fulltxt, " "winking smiley);
$fulltxt = $fulltxt;
}

$weight = number_format($result[$i]['weight']/$maxweight*100, 2);
if ($title=='')
$title = $sph_messages["Untitled"];
$regs = Array();

if (strlen($title) > 80) {
$title = substr($title, 0,76)."...";
}
foreach($words['hilight'] as $change) {
while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$title." ", $regs)) {
$title = preg_replace("/".$regs[1]."/", "<b>".$regs[1]."</b>", $title);
}

while (@eregi("[^\>](".$change."winking smiley[^\<]", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/", "<b>".$regs[1]."</b>", $fulltxt);
}
$url2 = $url;
while (@eregi("[^\>](".$change."winking smiley[^\<]", $url2, $regs)) {
$url2 = preg_replace("/".$regs[1]."/", "<b>".$regs[1]."</b>", $url2);
}
}


$num = $from + $i;

$full_result['qry_results'][$i]['num'] = $num;
$full_result['qry_results'][$i]['weight'] = $weight;
$full_result['qry_results'][$i]['url'] = $url;
$full_result['qry_results'][$i]['title'] = $title;
$full_result['qry_results'][$i]['fulltxt'] = $fulltxt;
$full_result['qry_results'][$i]['url2'] = $url2;
$full_result['qry_results'][$i]['page_size'] = $page_size;
$full_result['qry_results'][$i]['domain_name'] = $domain;
$i++;
}
}



$pages = ceil($rows / $results_per_page);
$full_result['pages'] = $pages;
$prev = $start - 1;
$full_result['prev'] = $prev;
$next = $start + 1;
$full_result['next'] = $next;
$full_result['start'] = $start;
$full_result['query'] = $entitiesQuery;

if ($from <= $to) {

$firstpage = $start - $links_to_next;
if ($firstpage < 1) $firstpage = 1;
$lastpage = $start + $links_to_next;
if ($lastpage > $pages) $lastpage = $pages;

for ($x=$firstpage; $x<=$lastpage; $x++)
$full_result['other_pages'][] = $x;

}

return $full_result;

}


?>
t-p
Re: Sphider PHP5.3.0 and ereg function
November 04, 2009 06:20PM
hi,

thanks for sharing.

Just need some clarification about smiley.

1. You indicate to replace it with this ),
2. In a different post it was indicated to replace it with this winking smiley
3. The existing code, where the smiley appears, it looks there is only this )

which is the right one? Do these smiley means different things in different posts? Just want to make sure I understand it properly.

Thnaks.

--tp
Re: Sphider PHP5.3.0 and ereg function
November 10, 2009 06:34PM
it works with mb_ prefix.

Thanks a lot. ^-^
t-p
Re: Sphider PHP5.3.0 and ereg function
November 11, 2009 02:57AM
garcon1986 Wrote:
-------------------------------------------------------
> it works with mb_ prefix.
>

Hi garcon1986,

For a layperson like me, please help by identifyinh exactly the FILE(s) NAME(s) and the LINE(s) that needs to be changed.

Thanks a lot.

--tpabla
Re: Sphider PHP5.3.0 and ereg function
November 14, 2009 08:25PM
Hello tpabla,

As i know, you need to change the files sphider.php and sphiderfuncs.php located in admin directory, just add "mb_" prefix to eregi, eregi_replace, ereg etc. Because in PHP5.3.0 they are deprecated, and replaced by the format "mb_eregi"etc.

hope this helps.

Garcon1986
Sorry, only registered users may post in this forum.

Click here to login