Welcome! Log In Create A New Profile


spezial chars in db ??? ..

Posted by rohaase 
spezial chars in db ??? ..
March 14, 2007 10:58PM

I am testing sphider (Windows Xp SP2, Mysql 5.027 ...).
The first is: it runs well, no problems by installing.

But now I try to index my homepage. The page is a page for
elektrical instruments in german language. So I have some
words like W?rmebild-Kamera (Infraredcamera). The data are
inserted correct into my database. But when I start a
test-search with 'w?' and 'Phrase Search' I got no tags. I
think this happens because the german chars like ???????
are translated into ä ...

Is there are a possibility the map the data in DB or in
the search field?

Is it correct that the words are not found?

Other words are found without any problenms (Kabelfinder)
when I type in 'kabel' into the search field ?

Has anybody an idea what I can do ?

Thans for your help ...
Re: spezial chars in db ??? ..
March 16, 2007 01:32PM

as i just migrated sphider into ModX CMS (modxcms.com) i think i know what the problem is.

Do You use UTF-8 encoding?

Edited 1 time(s). Last edit at 03/16/2007 01:33PM by cbieser.
Re: spezial chars in db ??? ..
March 20, 2007 12:51PM
Do you mean in database? ( I use latin-swedish)

Sorry for late answering, but I was a few days not at home!

ZThank for yout helping
Re: spezial chars in db ??? ..
March 21, 2007 11:24AM
I have the same problem in a Spanish site with special characters like "?","?","?", etc.. and I also thought that this could be an encoding problem but I've been inspecting the Sphider database (MySql latin1 encoded) and I've seen that the content of the web page is stored exactly how it appears in the source of the web pages.. for example, if there is an accented character like "?" it is stored as "?", if there is a HTML ENTITY like "á" it is stored as "á".

There are three points where the encoding is defined:
1) In the source of the web pages of the site (ISO-8859-1)
2) In the database (latin1)
3) In the source of the results page (search_results in templates/standard folder)

I think that cbieser points to the right direction, but there is a problem:
1) It should be necessary to change the encoding of all the web pages of the site to UTF-8.
2) It should be necessary to change the encoding of the database from latin1 to UTF-8.
2) It should be necessary to change the encoding of the results page.

ISO-8859-1 is a widely accepted standard in Europe and I think that these changes are not a choice.

Should it be possible to add a languaje-depending module to Sphider in order to give an internal translation?

This could be done in several ways but I would like to propose one: Storing the special characters as they corresponding ASCII value whit a prefix to identify them.

For example, the character "?" should be stored as "<your prefix>160", the HTML entitie should be "<your prefix>160", the UTF-8 code "&#225" should be translated as "<your prefix>160" (all they are the same character "?"winking smiley.
This requires to build the corresponding translation tables with associated pairs of columns like "HTML ent.<->ASCII", "UTF-8<->ASCII", and so on.. and controlling the storing/retrieving of the words involved in the translation in run time.
I'm sorry if this is not a sofisticated proposal.. I'm not an expert in encoding issues, but I only would like to enlighten this problem.

Thank you,
Anonymous User
Re: spezial chars in db ??? ..
March 28, 2007 06:34PM
I use it in portuguese, with no problem and plenty of ?, ?, ?, ?o, etc. Of course MySQL and Shider are configured for the portuguese language.
Re: spezial chars in db ??? ..
April 10, 2007 05:24AM
First, I love this script. I've been using it a couple of years now and it runs great. Thanks to everyone involved.

Now for my completely unreasonable request. I use Sphider to maintain a central search engine for about 80 Hungarian language sites. Things were great when everybody used ISO-8859-2 or Windows Central European, but now a few larger sites are using cms packages that only come with Hungarian language files and data bases in UTF-8. Results from those few sites come back as gobbledy-gook.

Could anything be done about this? My database stores in ISO-8859-1 Swedish.
Re: spezial chars in db ??? ..
April 11, 2007 07:36PM
The main problem is that Sphider fetches everything using latin1.

I've made the following:
searchfuncs.php Line 27
$request = "GET $path HTTP/1.0\r\nHost: $host$portq\r\nAccept: $all\r\nAccept-Charset: utf-8\r\nAccept-Encoding: identity\r\nUser-Agent: $user_agent\r\n\r\n";
searchfuncs.php Line 94
$request = "HEAD $path HTTP/1.1\r\nHost: $host$portq\r\nAccept: $all\r\nAccept-Charset: utf-8\r\nAccept-Encoding: identity\r\nUser-Agent: $user_agent\r\n\r\n";
My Database uses utf8_general_ci.

Edited 1 time(s). Last edit at 04/11/2007 07:38PM by cbieser.
Sorry, only registered users may post in this forum.

Click here to login