Problème with french charactors
March 24, 2016 04:12PM
After testing the search engine shows some special characters like "é" , "à" , "è" are still not found. when my search auto -suggest these words,

in the place of "testé" it displays like this "test<?>". I have encoded some parts of the source code as utf8-encode. But i surprise still i have the same problem on auto-suggestion part. Is there any solution to fix it?
Re: Problème with french charactors
March 24, 2016 06:14PM
The original Sphider is based on charset ISO-8859-1, consequently URLs not based on this charset will be indexed incorrectly, and several search terms may fail afterwards.

In order to avoid further problems, your search engine should be able completely to handle utf-8.
Please have a look at the improvements 'rap' meanwhile added to the original Sphider.
If you additionally look for a lot of more features than pure text indexation, you should also have a look at Sphider-plus.

Re: Problème with french charactors
March 25, 2016 08:51PM
Looking closer at the issue, the problem DOES exist in the suggest feature. It is affecting more than just French characters and is resulting from a mis-translation of utf-8 characters occurring in the database. I will be looking for a fix. There is a mismatch between characters in the database and the characters displayed. In reverse, searching for the displayed suggestion will find no results.

UPDATE: The code is not the problem, the data is! New installations do not experience the problem.

Explanation: The database tables for the mod specify a utf-8 default collation. The ORIGINAL 1.3.6 database had an iso default. The Sphider mod alters the tables to utf-8 when the database already exists. Creation of the tables for a new installation automatically default to utf-8.

A problem will arise when an EXISTING set of tables is UPGRADED to utf-8. The structure is changed, but the underlying data is NOT. Any new data will be utf-8 data in a utf-8 structure, but OLD data will still be iso data but contained in a utf-8 structure.

Simply re-indexing a site will not solve the problem since if a page has not changed, Sphider will not perform a update. One can either manually go through each table, row by row, and update the data to utf-8 (Ugh! No thanks!). or truncate the tables and re-index the site. Naturally, a backup would be advised because some faulty data is better than no data! The tables to truncate would be keywords, links, and link_keyword0 thru link_keywordf.

Edited 2 time(s). Last edit at 03/27/2016 05:14PM by rap.
Re: Problème with french charactors
March 26, 2016 11:47AM
For Sphider-plus this issue is solved, because utf-8 is fully supported for all languages.

Sorry, only registered users may post in this forum.

Click here to login