|
Sphider-utf - utf-8 version of sphider here September 24, 2011 03:19PM |
Registered: 1 year ago Posts: 7 |
Shider now can index multy-lang sites, at least it index cp-1251 and utf-8 in russian pretty well. It may be some errors and may fail index some pages, but in general it work fine. 1. Sphider now use: - UTF-8 so u must change u database collation to utf8_general_ci - MySQLi class to interact with mysql - now use multi-bytes string functions so it work correct with UTF-8 2. FIXED MySQL server has gone away error 3. couse php limitation and current indexing algoritm shider ignore pages with size more than 1 megabyte 4. Some changes made to database so make sure use sql/upgrade_to_1.4.sql to update u db. 5. Sphider now auto detect site codepageand many other changes so many i cant remember all.

|
Re: Sphider-utf - utf-8 version of sphider here October 07, 2011 05:08PM |
Registered: 1 year ago Posts: 1 |
|
Re: Sphider-utf - utf-8 version of sphider here October 10, 2011 11:07AM |
Registered: 1 year ago Posts: 2 |
|
Re: Sphider-utf - utf-8 version of sphider here October 10, 2011 05:23PM |
Registered: 1 year ago Posts: 2 |
|
Re: Sphider-utf - utf-8 version of sphider here February 02, 2012 01:34PM |
Registered: 1 year ago Posts: 2 |
|
Re: Sphider-utf - utf-8 version of sphider here April 01, 2012 08:50PM |
Registered: 1 year ago Posts: 31 |
|
Re: Sphider-utf - utf-8 version of sphider here June 08, 2012 06:21PM |
Registered: 11 months ago Posts: 1 |
|
Re: Sphider-utf - utf-8 version of sphider here August 24, 2012 09:24PM |
Registered: 9 months ago Posts: 3 |
[mysqld] ......... skip-character-set-client-handshake collation-server=utf8_unicode_ci character-set-server=utf8 ..........
AddDefaultCharset UTF-8
default_charset="UTF-8"
$pdftotext_path='/usr/bin/pdftotext';
.htmlentities($word)
.htmlentities($word, ENT_NOQUOTES, "UTF-8"
either as .pdf files using a PDF converter (such as CutePDF) , or export "Web page" format files (.htm) from MS Word.