Welcome! Log In Create A New Profile


not all content indexed on htm page

Posted by happy4U 
not all content indexed on htm page
May 13, 2016 03:39AM
I recently created a basic html page to hold mostly text content, i.e. mostly information contained within paragraph tags. Unfortunately whenever I index these pages the content of the paragraph's is not being indexed. Which is unfortunate as I very much want sphider to index the content of the paragraphs, similar to how it would index the content of a .txt file, so that users can conduct phrase searches. Is this how sphider should work? If not, what have I possibly done wrong? If so, is there a possible setting or code adjustment (within the html or within sphider) that can be made to get it to index the paragraphs?

Below is an example of the code in the body of the htm page. The page has 5 more div's the same as below with just the paragraph content and id's changing.


<div class="container2" id="container2-1">
<h4 id="h4-1">Description:</h4>
<p id="p-1">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</p>

Thanks in advance for any help.
Re: not all content indexed on htm page
May 17, 2016 09:21AM
Check your log file to be sure the page is actually being indexed. Perhaps it isn't being found or has insufficient content.
Be sure the content doesn't lie between noindex tags.
Be sure word stemming is NOT checked in settings.
Check your minimum word length in settings.
Re: not all content indexed on htm page
May 18, 2016 09:59PM
Thanks for the reply! smiling smiley

Log indicates that they are being indexed. There are not any noindex tags in the html. Minimum word length is set to 2. Word stemming was enabled, but I disabled it, and cleared link and keywords and then reindexed, but search will still not return anything.

If I take the text from the paragraphs in the html and just add it to a txt file and index that, and then perform a phrase search for a snippet of text contained in that file, then sphider will return that txt file. So I know sphider is working in that sense, it just won't when contained in an htm page, and its not just the phrase search, but any search whether using AND, OR, or Phrase. I can get the AND, OR search to work if I add the contents of one the paragraphs into the meta keywords header tag, but Phrase search will still not work, as I assume that since it's in the keywords tag, that sphider is treating each word as an individual word and not linking and/or storing them together like it would when it OCR's text from a txt or PDF.
Re: not all content indexed on htm page
July 05, 2016 12:04PM
Thank for share !!!!
thumbs up thumbs up thumbs up thumbs up
Sorry, only registered users may post in this forum.

Click here to login