Welcome! Log In Create A New Profile

Advanced

SOLVED Case insensitive search using non English charsets

Posted by test-sphider 
SOLVED Case insensitive search using non English charsets
November 22, 2011 03:55PM
Hello guys,

I'm testing sphider.
As you know when you have a keyword "Book" in your web site and when you search "book" or "BOOK" or "Book" they all match the same page and highlight "Book". My problem is:

I use windows-1254 charset and I made my mysql table charset as latin5_turkish_ci and my html pages have windows-1254 charset. For example I have "AĞRI" in one of my page title. When I search "ağrı", It does not it only search for "ağrı" not "AĞRI" and as a result highlights "ağrı" only. Another example:
I have "ÇANAKKALE" in one of my pages. When I search "çanakkale" it says "did you mean Çanakkal" not "did you mean Çanakkale" and as a results returns no hit.

1- How can we define the script that both "ğ" and "Ğ" or , "ç" and "Ç", "ş" and "ş", "ı" and "I" are the same characters with just different cases and it should search without case sensitive as in case of other non special characters?

(Maybe by editing "commonfuncs.php" in line sttarting form 78 :

$entities = array
(
"&amp" => "&",
"&apos" => "'",
"Þ" => "Ş",
"ß" => "ß",
"à" => "à",
"á" => "á",............
)


Thanks in advance...



Edited 2 time(s). Last edit at 12/22/2011 06:09PM by test-sphider.
Re: SOLVED Case insensitive search using non English charsets
December 22, 2011 06:07PM
Hi guys,

Finally I solved my issue, and I am sure most of people had the same issue. Long story short, here is my I did for Turkish characters:

In commonfuncs.php, I removed Turkish accented words in remove accent words part. Then here comes the trick. In line 78 starts with "$entities", I added the followings chars:



$entities = array
(
"Ç" => "c",
"ç" => "c",
"Ğ" => "g",
"ğ" => "g",
"İ" => "i",
"ı" => "i",
"Ö" => "o",
"ö" => "o",
"Ş" => "s",
"ş" => "s",
"Ü" => "u",
"ü" => "u",



Then delete your database and do a clean index (Hint: you don't need to delete query log).

Worked for me..

One issue that remains is, it does not highlight the accented words even it finds them.
Hope one of you can solve this..



Edited 1 time(s). Last edit at 12/22/2011 06:07PM by test-sphider.
Re: SOLVED Case insensitive search using non English charsets
November 05, 2013 12:01PM
I know this issue dates back a few years. I'm wondering if you were able to figure out how to highlight the accented characters in the search results? I'm using the latest version of Sphider that was just released in June 2013, so I would have thought this issue would have been resolved. Any ideas? Thanks.
Tec
Re: SOLVED Case insensitive search using non English charsets
November 05, 2013 09:22PM
http://www.sphider.eu/forum/read.php?2,10598,10599#msg-10599
Sorry, only registered users may post in this forum.

Click here to login