Welcome! Log In Create A New Profile

Advanced

wont index a .nsf page

Posted by asw22pilot 
wont index a .nsf page
May 02, 2007 06:32AM
trying to index links off of this page.
http://www.asic.gov.au/asic/asic.nsf/byheadline/April+2007+media+and+information+releases?openDocument

it comes up
"Spidering http://www.asic.gov.au/asic/asic_pub.nsf/byheadline/April+1999?opendocument
1. Retrieving: http://www.asic.gov.au/asic/asic_pub.nsf/byheadline/April+1999?opendocument at 23:39:27.
Not text or html"

I think the .nsf page suffix is throwing it off.

The underlying page source is standard HTML

Any clues?

Thanks

Al

<a href="/asic/ASIC.NSF/byHeadline/2007%20media%20and%20information%20releases">2007 releases</a>
</div>
<div class="page-options">
<a href="#" onClick="resizeDown();return false;"><img src="/asic/rwpgslib.nsf/icon_textSizeDown.gif" width="23" height="20" alt="decrease text size" title="decrease text size"></a>
<a href="#" onClick="resizeUp();return false;"><img src="/asic/rwpgslib.nsf/icon_textSizeUp.gif" width="23" height="20" alt="increase text size" title="increase text size"></a>
<a href="#" onclick="javascript:window.print();" title="print page" rel="nofollow"><img src="/asic/rwpgslib.nsf/icon_printPage.gif" width="23" height="20" alt="print page" title="print page"></a>
</div>
<a name="skip" id="skip"></a>
<b><font face="Arial" size="4">07-112 Gabrial Pennicott arrested in Canada</font></b><br><br>
<i><font face="Arial" size="2">Monday 30 April 2007</font></i><br><br>
<br>
<font face="Arial">Mr Gabrial Neil Pennicott, formerly of Melbourne, Victoria, has been arrested in British Colombia, Canada. </font><br>
<br>
<font face="Arial">Mr Pennicott was arrested on 26 April 2007 as a result of the Australian Government making a request to the Canadian authorities to issue a provisional arrest warrant. The provisional arrest warrant was issued for 47 corporate-related charges laid by ASIC. </font><br>
<br>
<font face="Arial">ASIC&#8217;s investigation and the subsequent charges against Mr Pennicott relate to the circumstances surrounding the promotion of investment opportunities to investors, the transfer of funds raised by such investments between companies controlled by Mr Pennicott and the repayment of the investments. </font><br>
<br>
<font face="Arial">ASIC&#8217;s investigation is continuing.</font> <div class="page-options">
<a href="#" onClick="resizeDown();return false;"><img src="/asic/rwpgslib.nsf/icon_textSizeDown.gif" width="23" height="20" alt="decrease text size" title="decrease text size"></a>
<a href="#" onClick="resizeUp();return false;"><img src="/asic/rwpgslib.nsf/icon_textSizeUp.gif" width="23" height="20" alt="increase text size" title="increase text size"></a>
<a href="#" onclick="javascript:window.print();" title="print page" rel="nofollow"><img src="/asic/rwpgslib.nsf/icon_printPage.gif" width="23" height="20" alt="print page" title="print page"></a>
</div>
</div> <!-- body-content -->
</div> <!-- body-container -->
<div id="footer-container">
<div id="footer-content">
<ul>
<li class="first"><a href="/asic/asic.nsf/byheadline/Hints+for+using+our+website+asic+version?openDocument ">using this site</a></li>
<li><a href="/asic/asic.nsf/byheadline/Site+map?openDocument
">site map</a></li>
<li><a href="/asic/asic.nsf/byheadline/Copyright+%26+linking+to+our+websites?openDocument">copyright</a></li>
<li><a href="/asic/asic.nsf/byheadline/Privacy?openDocument
">privacy</a></li>
<li><a href="/asic/asic.nsf/byheadline/Accessibility?openDocument">accessibility</a></li>
<li><span>Last updated: 04/30/2007</span></li>
</ul>
</div> <!-- footer-content -->
</div> <!-- footer-container -->



Edited 2 time(s). Last edit at 05/02/2007 06:41AM by asw22pilot.
Re: wont index a .nsf page
May 03, 2007 05:35AM
edit the file

admin/spiderfuncs.php
near line 153

You will see code like this
				} else if (($regs[1] == 'application/msword' || $regs[1] == 'application/vnd.ms-word') && $index_doc == 1) {
					$status['content'] = 'doc';
					$status['state'] = 'ok';
				} else if (($regs[1] == 'application/excel' || $regs[1] == 'application/vnd.ms-excel') && $index_xls == 1) {
					$status['content'] = 'xls';
					$status['state'] = 'ok';
				} else if (($regs[1] == 'application/mspowerpoint' || $regs[1] == 'application/vnd.ms-powerpoint') && $index_ppt == 1) {
					$status['content'] = 'ppt';
					$status['state'] = 'ok';

so you need to add one more Else if like
} else if ($regs[1] == 'text/html; charset=ISO-8859-1') {
					$status['content'] = 'text';
					$status['state'] = 'ok';

if this does not work, let me know

Diego Medina
[url=http://www.fmpwizard.com]Web Developer[/url]
Re: wont index a .nsf page
May 03, 2007 09:54PM
Thanks

Let me try this will let you know.

Regards

Al
Re: wont index a .nsf page
May 08, 2007 04:24AM
Added the additional Else statement.

It did not work for me sad smiley

Any other clues?

Thanks

Al
Sorry, only registered users may post in this forum.

Click here to login