Welcome! Log In Create A New Profile

Advanced

First word of page doesn't display in search results

Posted by AndrewB 
First word of page doesn't display in search results
March 19, 2012 08:14AM
Hi All,

I found that under certain circumstances the first word of a page would not display in the search results. This happened even if the first word was a keyword.

I think I found why this happens and also have implemented a fix but not sure it's the best fix. Any feedback welcome.

Conditions

The condition when this happens is if the page contains more than 30 characters of whitespace prior to the first word and if the 30 characters proceeding the word are tabs. If there are spaces contained within the 30 characters before the first word the problem doesn't occur.

Cause

In /include/serachfuncs.php in the get_search_results function there is a section of code (line 557 to 565) that cuts the full text down to a suitable portion for display on the results page. If the keyword is within 30 characters of the beginning of the full text then nothing is cut from the start and all is well. If the keyword is beyond 30 characters of the beginning then the full text is cut to include only the 30 characters prior to the search word. (Line 557 & 558). Line 561 then looks for the first space character and cuts everything before that point. This seems to be to ensure half words aren't displayed in the results. The problem is that in the scenario listed above the first space character is after the first word causing the first word to be dropped.

Fix

The whitespace discussed above is what is left after removing all of the HTML tags and other things. I'm not sure that there is any value in having all the whitespace before and after the full text so what I've done to fix this problem is add a trim call in the clean_file method in /admin/spiderfuncs.php. I've added the below two lines at line 599.

$file = preg_replace("/ /", " ", $file);
// Remove whitespace at the start and end of $file
$file = trim($file);

$fulltext = $file;

Another fix could be to change get_search_results to handle tabs when looking for word separators. This would probably fix the problem for more scenarios. Yet another option would be to replace all tabs with spaces in the full text as tabs aren't required for displaying search results anyway.

Let me know your thoughts and if you think this is a real issue that needs fixing or if I've misunderstood something somewhere.

Regards,
Andrew
Sorry, only registered users may post in this forum.

Click here to login