Tec, If the numbers are always of the same form, for example dd dddddd ddd it'd be better tu use a regexp (preg_match ?) instead of forcing a point to replace a space. So, if the regexp doesn't find the right form it treats it as a normal phrase.by GeekMan - Sphider Support
If I'm not wrong, only sphider-plus deals with UTF-8 You can download it there :by GeekMan - Sphider Support
Won't it also follow external domains ? That could be a problem ^^by GeekMan - Sphider Support
Ah,OK. Thank you Tec ^^by GeekMan - Sphider Support
So, why not strip off (is that the correct expression ?) blank lines ? ^^by GeekMan - Sphider Support
No plug-in, just plain javascript in the search engine should be enough to do that. Look at how work WYSIWYG like tinymce.by GeekMan - Sphider Mods
Hmm, perhaps add a md5 comparison, I mean, search a record with the exact same md5, in wich case the page is not added.by GeekMan - Sphider Support
Hmm, I can only (perhaps) help you for a fiew things. Fisrt, your subdomains should be like That's a visitor'point of view and a webmaster's too ^^ Second. You can create a category for each subdomain and/or add them as new domain in the admin panel of Sphider. When people want to search in a specific subdomain they'll just have to select it ^^by GeekMan - Sphider Mods
It doesn't index database content, it indexes web pages. If there are no page and/or no link to those pages, Sphider, or any crawler, will never be able ton index them.by GeekMan - Sphider Support
>The variable $data is defined in file .../admin/spiderfuncs.php >function getFileContents() as: >$data = null; And that is the problem, it is defined inside a function, not outside. So, when php gets inside index_url() $data is not detected as defined because it is out of the scope of this function. By the way, in spiderfuncs.php,line 296, where does $urlparts come from in unsetby GeekMan - Sphider Support
Never try this on my website or the website of many webmasters I know. Your crawler (name and IP) would quickly end in the ban list and its datas transmitted to the list maintener. When a webmaster forbid crawlers he/she has a reason. Two of mine are configured to do so as they are still in test mode and I don't want anybody to come except a few persons who help me debug them.by GeekMan - Sphider Mods
Hi Tec. There is a big problem with the $data array. First you use it in several functions without initialize it (as many others variables). Each array should be initialized before use. Second, in index_url() you use it for the condition if ($data['nofollow'] != 1) { ... } but, it is not yet initialized at this time. It is initialized only after the call of clean_file() in $data =by GeekMan - Sphider Support
I think you have a problem because I spider more than 2000 links in a very short time and ce CPU load is low. My laptop is a Core2 Duo T9300 @ 2.50GHz with 4Go RAM.by GeekMan - Sphider Support
The best would be to give us the adress of your website to check if there is something wrong with it ^^by GeekMan - Sphider Support
Tec, I think you are mistaking on what surfart wants ^^ From what I understand he wants the search result to give only one result by domain, not index only one link.by GeekMan - Sphider Support
I don't know what Tec plans for future ^^ He could rewrite the calls to database using an abstraction class. By now you'll have to rewrite them by yourself... as I'll do because I prefer MySQLi ^^by GeekMan - Sphider Support
Well, it's possible that you MySQL server doesn't accept persistent connections. Try replaceing mysql_pconnect() with mysql_connect().by GeekMan - Sphider Support
Hi Paul. Using Sphider or Sphider-plus ? From what I see it shouldn't be a problem whith Sphider-plus... I also use rewrite rules but not short links (without .xxx).by GeekMan - Sphider Support
@JamesF : this may be a problem only if you launch the indexing from a command line, not from the admin panel. From the command line, assuming you have only one website (or you need to index them all) you can use this : php spider.php -all -f So $_SERVER['argc'] is set and $_SERVER['argv'] equal 2.by GeekMan - Sphider Support
From what I know you can't name fields in a regexp generated array. What you can do is that (not tested) : Replace : preg_match_all("/href\s*=\s*[\'\"]?([+:%\/\?~=&;\\\(\),._a-zA-Z0-9-]*)(#[.a-zA-Z0-9-]*)?[\'\" ]?(\s*rel\s*=\s*[\'\"]?(nofollow)[\'\"]?)?/i", $file, $regs, PREG_SET_ORDER); whith something like that : preg_match_all("/href\s*=\s*[\'\&quoby GeekMan - Sphider Support
For me it was a fresh install. By the way, I have been able to index your website without any problem, cashbrown.by GeekMan - Sphider Support
In your case wouldn't it be better to change your prefered charset in your admin panel and invoke 'Erase & Re-index' ?by GeekMan - Sphider Support
Hmm, I'm not sure it will work. The default value for COLLATE seems to be latin1_swedish_ci. "ci" means case insensible if I don't mistake. So, if you want to make a search case sensible, you'll have either to convert all the tables/fields to xxxxx_bin or use : WHERE REGEXP BINARY \"^".iconv("ISO-8859-1","UTF-8",$search)."$\" (The call to iby GeekMan - Sphider Support
Hi Tec. $wordarray = unique_array(explode(" ", $data['content'])); The problem whith this kind of split function is that it doesn't really create an array of words because punctuation is taken with the word it is near. A much better way is to do so : $texte = html_entity_decode($data['content'],ENT_QUOTES,'UTF-8'); $wordarray = unique_array(preg_split("/[\s[:punct:][:spaby GeekMan - Sphider Support
Hi cashbrown. In Sphider-plus administrator tools/Configuration Settings, go to Index Log Settings and uncheck : Enable real-time output of logging data It will prevent Sphider from opening a new tab/window.by GeekMan - Sphider Support