<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>spezial chars in db ??? ..</title>
        <description> Hi, 

I am testing sphider (Windows Xp SP2, Mysql 5.027 ...).
The first is:  it runs well, no problems by installing.

But now I try to index my homepage. The page is a page for 
elektrical instruments in german language. So I have some
words like W?rmebild-Kamera (Infraredcamera). The data are
inserted correct into my database. But when I start a 
test-search with 'w?' and 'Phrase Search' I got no tags. I 
think this happens because the german chars like ??????? 
are translated into &amp;amp;auml; ...

Is there are a possibility the map the data in DB or in 
the search field? 

Is it correct that the words are not found?

Other words are found without any problenms (Kabelfinder) 
when I type in 'kabel' into the search field ?

Has anybody an idea what I can do ?

Thans for your help ...</description>
        <link>http://www.sphider.eu/forum/read.php?2,17,17#msg-17</link>
        <lastBuildDate>Thu, 23 May 2013 21:03:53 +0300</lastBuildDate>
        <generator>Phorum 5.2.10</generator>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,218#msg-218</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,218#msg-218</link>
            <description><![CDATA[ The main problem is that Sphider fetches everything using latin1.<br />
<br />
I've made the following:<br />
searchfuncs.php Line 27<pre class="bbcode">$request = &quot;GET $path HTTP/1.0\r\nHost: $host$portq\r\nAccept: $all\r\nAccept-Charset: utf-8\r\nAccept-Encoding: identity\r\nUser-Agent: $user_agent\r\n\r\n&quot;;</pre>
searchfuncs.php Line 94<pre class="bbcode">$request = &quot;HEAD $path HTTP/1.1\r\nHost: $host$portq\r\nAccept: $all\r\nAccept-Charset: utf-8\r\nAccept-Encoding: identity\r\nUser-Agent: $user_agent\r\n\r\n&quot;;</pre>
My Database uses utf8_general_ci.]]></description>
            <dc:creator>cbieser</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Wed, 11 Apr 2007 22:36:53 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,209#msg-209</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,209#msg-209</link>
            <description><![CDATA[ First, I love this script. I've been using it a couple of years now and it runs great. Thanks to everyone involved.<br />
<br />
Now for my completely unreasonable request. I use Sphider to maintain a central search engine for about 80 Hungarian language sites. Things were great when everybody used ISO-8859-2 or Windows Central European, but now a few larger sites are using cms packages that only come with Hungarian language files and data bases in UTF-8. Results from those few sites come back as gobbledy-gook.<br />
<br />
Could anything be done about this? My database stores in ISO-8859-1 Swedish.]]></description>
            <dc:creator>mburp</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Tue, 10 Apr 2007 08:24:28 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,96#msg-96</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,96#msg-96</link>
            <description><![CDATA[ I use it in portuguese, with no problem and plenty of ?, ?, ?, ?o, etc. Of course MySQL and Shider are configured for the portuguese language.]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Wed, 28 Mar 2007 21:34:02 +0300</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,41#msg-41</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,41#msg-41</link>
            <description><![CDATA[ Hi,<br />
I have the same problem in a Spanish site with special characters like &quot;?&quot;,&quot;?&quot;,&quot;?&quot;, etc.. and I also thought that this could be an encoding problem but I've been inspecting the Sphider database (MySql latin1 encoded) and I've seen that the content of the web page is stored exactly how it appears in the source of the web pages.. for example, if there is an accented character like &quot;?&quot; it is stored as &quot;?&quot;, if there is a HTML ENTITY like &quot;&amp;aacute;&quot; it is stored as &quot;&amp;aacute;&quot;.<br />
<br />
There are three points where the encoding is defined:<br />
1) In the source of the web pages of the site (ISO-8859-1)<br />
2) In the database (latin1)<br />
3) In the source of the results page (search_results in templates/standard folder)<br />
<br />
I think that cbieser points to the right direction, but there is a problem: <br />
1) It should be necessary to change the encoding of all the web pages of the site to UTF-8.<br />
2) It should be necessary to change the encoding of the database from latin1 to UTF-8.<br />
2) It should be necessary to change the encoding of the results page.<br />
<br />
ISO-8859-1 is a widely accepted standard in Europe and I think that these changes are not a choice.<br />
<br />
Should it be possible to add a languaje-depending module to Sphider in order to give an internal translation?<br />
<br />
This could be done in several ways but I would like to propose one: Storing the special characters as they corresponding ASCII value whit a prefix to identify them.<br />
<br />
For example, the character &quot;?&quot; should be stored as &quot;&lt;your prefix&gt;160&quot;, the HTML entitie should be &quot;&lt;your prefix&gt;160&quot;, the UTF-8 code &quot;&amp;#225&quot; should be translated as &quot;&lt;your prefix&gt;160&quot; (all they are the same character &quot;?&quot;).<br />
This requires to build the corresponding translation tables with associated pairs of columns like &quot;HTML ent.&lt;-&gt;ASCII&quot;, &quot;UTF-8&lt;-&gt;ASCII&quot;, and so on.. and controlling the storing/retrieving of the words involved in the translation in run time.<br />
I'm sorry if this is not a sofisticated proposal.. I'm not an expert in encoding issues, but I only would like to enlighten this problem.<br />
<br />
Thank you,]]></description>
            <dc:creator>settozero</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Wed, 21 Mar 2007 13:24:24 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,35#msg-35</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,35#msg-35</link>
            <description><![CDATA[ Do you mean in database? ( I use latin-swedish)<br />
<br />
Sorry for late answering, but I was a few days not at home!<br />
<br />
ZThank for yout helping]]></description>
            <dc:creator>rohaase</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Tue, 20 Mar 2007 14:51:25 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,21#msg-21</guid>
            <title>Re: spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,21#msg-21</link>
            <description><![CDATA[ Hi!<br />
<br />
as i just migrated sphider into ModX CMS (modxcms.com) i think i know what the problem is.<br />
<br />
Do You use UTF-8 encoding?]]></description>
            <dc:creator>cbieser</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Fri, 16 Mar 2007 15:32:49 +0200</pubDate>
        </item>
        <item>
            <guid>http://www.sphider.eu/forum/read.php?2,17,17#msg-17</guid>
            <title>spezial chars in db ??? ..</title>
            <link>http://www.sphider.eu/forum/read.php?2,17,17#msg-17</link>
            <description><![CDATA[ Hi, <br />
<br />
I am testing sphider (Windows Xp SP2, Mysql 5.027 ...).<br />
The first is:  it runs well, no problems by installing.<br />
<br />
But now I try to index my homepage. The page is a page for <br />
elektrical instruments in german language. So I have some<br />
words like W?rmebild-Kamera (Infraredcamera). The data are<br />
inserted correct into my database. But when I start a <br />
test-search with 'w?' and 'Phrase Search' I got no tags. I <br />
think this happens because the german chars like ??????? <br />
are translated into &amp;auml; ...<br />
<br />
Is there are a possibility the map the data in DB or in <br />
the search field? <br />
<br />
Is it correct that the words are not found?<br />
<br />
Other words are found without any problenms (Kabelfinder) <br />
when I type in 'kabel' into the search field ?<br />
<br />
Has anybody an idea what I can do ?<br />
<br />
Thans for your help ...]]></description>
            <dc:creator>rohaase</dc:creator>
            <category>Sphider Support</category>
            <pubDate>Thu, 15 Mar 2007 00:58:30 +0200</pubDate>
        </item>
    </channel>
</rss>
