Welcome! Log In Create A New Profile

Advanced

list all pages from a domain

Posted by pl_90 
list all pages from a domain
September 27, 2007 03:31PM
Hello,

I would like have a function so the user can list all pages from a domain, like Google: site:domain.de ?

Can anyone help?



Edited 1 time(s). Last edit at 09/30/2007 06:04PM by pl_90.
Re: list all pages from a domain
September 30, 2007 06:04PM
No Idea?
rec
Re: list all pages from a domain
September 30, 2007 09:43PM
Like this?

Sphider Site Search Example

[www.sphider.eu]
Re: list all pages from a domain
October 01, 2007 01:32PM
Hi.. Thanks for your reply...
No, I mean like this: [www.google.com] winking smiley


Greetings,

Dawid
Re: list all pages from a domain
October 01, 2007 11:14PM
Who writes the fonction?
Tec
Re: list all pages from a domain
October 04, 2007 03:45PM
Why do you need a special mod for that? In admin/settings before indexing enable 'Index words in domain name and url path'. If you enter 'www.abc.de' as search string, then you automatically will get all pages of that domain.
Also Sphider as delivered from Ando is more intelligent than Google. You don't have to enter site:www.abc.de
There is only one problem: Wildcard * is missing! Otherwise you could also search for subdomains of www.abc.de . A search query like www.*.abc.de would be helpful. Wildcard could be a useful future option . . .
To all of you: Don't start a new threat asking for that. At present I'm too busy.

Tec
Tec
Re: list all pages from a domain
October 04, 2007 05:43PM
Re: list all pages from a domain
October 04, 2007 08:32PM
But seems like it doesnt work if we put like [www.sphider.eu] in the search form... How can we enable this?

Thank you
Tec
Re: list all pages from a domain
October 05, 2007 01:35AM
In .../search.php search for:

switch ($search) {
case 1:

if (!isset($results)) {
$results = "";
}

Beyond this additionally include:

$pos = '';
$pos = strpos(strtolower($query),"www"winking smiley;
if ($pos != '') $query = substr($query,$pos);


Happy coding

Tec
Re: list all pages from a domain
October 12, 2007 01:31PM
Thank you Tec,

But it seems like only work with the link have www in front of it, so if the user put like [mydomain.com] it doesnt work, as well as if have the "/" in the end it doesnt work too....

Can you please help me? Thank you very much
Tec
Re: list all pages from a domain
October 12, 2007 07:15PM
You touched another Sphider limitation we suffer for several centuries. As / and : are not stored when indexing, [abc.de] will not be stored and can't be searched afterwards. We could bypass also these limitations, (see .../admin/spiderfuncs.php row 619) but I think afterwards we will run into another trap. In order to be ready for future contingencies it would be more intelligent to write a special mod as requested by pl_90 in the first posting of this threat.

Tec
Tec
Re: list all pages from a domain
October 15, 2007 11:56PM
This is an example for a mod that searches for all pages, which belong to a domain. Initialize your search query with 'site:' followed by the domain you want to check. Something like: 'site:[www.abc.de]' will work if this domain was indexed before. This mod is designed also for lazy people like me. So you may search for: 'site:abc.de' or even 'site:abc'. The mod searches for all links in Sphiders link-table not in the stored keywords. The search output has the same look and feel as usual in Sphiders search results.

Known limitations:
1. Result output always uses the <meta> description to characterize a site. The according setting in admin settings is ignored. My decision because here we are not searching for any keywords of a text. This is also why we don't have to highlight anything.
2. All results are displayed on one page. Selection of 'Default results per page' in admin settings is ignored.
It seems I became lazy.


In search.php search for:

$search_results = get_search_results($query, $start, $category, $type, $results, $domain);

Above this row additionally include:

$pos = strstr(strtolower($query),"site:"winking smiley;
if ($pos) include ("$include_dir/search_links.php"winking smiley;


In .../include/ folder create a new file called search_links.php with the following contents:

<?
$starttime = getmicrotime();
$notitle = "No meta title available for this site";
$nodes = "No meta description available for this site";
$query = strtolower($query);
$pos = strpos($query,":"winking smiley;
$urlquery = strip_tags(trim(substr($query,$pos+1)));
// Get all links of this URL
$res=mysql_query("select * from ".$mysql_table_prefix."links where url like '%$urlquery%'"winking smiley;
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows==0) print "<br><div id =\"result_report\">The search \"$urlquery\" didn't match any stored URL</div>";

if ($num_rows > 0) {
$endtime = getmicrotime() - $starttime;
$time = round($endtime*100)/100;
print "<br><div id =\"result_report\">Displaying $num_rows URL results for \"$urlquery\" ($time seconds)</div><br><br>";
for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url"winking smiley;
$title = mysql_result($res, $i, "title"winking smiley;
$description = mysql_result($res, $i, "description"winking smiley;
$page_size = mysql_result($res, $i, "size"winking smiley;

?>
<b><?php print $i+1?>.</b>
<a href="<?php print $url?>" class="title"> <?php print $title; if (!$title) print $notitle?></a><br/>
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
<div class="url"><?php print $url2?> - <?php print $page_size?> kB<br><br></div>

<?
}
}
die ('');

?>


Happy coding

Tec
Re: list all pages from a domain
October 16, 2007 01:40AM
Hmmm it doesnt work for me when i type site:www.google.com ; it just becomes www.google.com and search for that. Not listed all the pages from site though. Thank you
Tec
Re: list all pages from a domain
October 17, 2007 01:42AM
Strange. Did you index www.google.com before? This mod only displays domains and their sites after you indexed them. Have a look at Sphiders link table ".$mysql_table_prefix."links Do you see anymore 'www.google.com' links there?
Also take notice that something like
[images.google.com].......
will not be found if you search for 'site:www.google.com'. Try the lazy way: 'site:google.com'. If this is your problem, I will have to modify my mod. Please let me know.

Tec
Re: list all pages from a domain
October 17, 2007 02:16AM
Hi;

Yes i have indexed www.google.com; i tried to index other sites also, but the results are the same...I dont know why

Thank you
Tec
Re: list all pages from a domain
October 17, 2007 09:23AM
Have a look at Sphiders link table ".$mysql_table_prefix."links Do you see anymore 'www.google.com' links there?

Tec
Re: list all pages from a domain
October 17, 2007 04:50PM
Yeah it doesnt work for me too sad smiley
Tec
Re: list all pages from a domain
October 17, 2007 06:54PM
Yeah it works for me !

If you don't give me more information I can't help you !

Did you place:
$pos = strstr(strtolower($query),"site:"winking smiley;
if ($pos) include ("$include_dir/search_links.php"winking smiley;

A B O V E

$search_results = get_search_results($query, $star.. . . ?

Have a look at Sphiders link table. Do you see any corresponding links there?

Try the lazy way: 'site:abc.de. If this is your problem, I will have to modify my mod. Please let me know.

Use var_dump() to see where the mod fails.

Tec
Re: list all pages from a domain
October 17, 2007 08:47PM
Yeah i did everythinglike you have told us...

Also there was the link on the database. I try to search like www.google.com; then it has every pages from google. But when try site:www.google.com then it just query the keyword www.google.come

[localhost]

so it listed all the sites which have keyword google.com; not listed all the pages from google

Thank you
Tec
Re: list all pages from a domain
October 18, 2007 06:56PM
Version 2

Advantages over first version:

1. Finds all sites now.
If you search for 'site:www.abc.de' also sites like 'www.subsite.abc.de' will be listed (if they were indexed before as a link of 'www.abc.de'.

2. Multiple choice will now be offered for selection.
Assume, you indexd as different URLs:
[www.abc.de]
[www.my-abc.de]
[mysite.abc.de]
If your search query now would be 'site:abc.de' the mod will ask you which abc you want to list.

In order to get this, replace the content of .../include/search_links.php with the following:

<?
$starttime = getmicrotime();
$notitle = "No meta title available for this site";
$nodes = "No meta description available for this site";
$query = strtolower($query);
$pos = strpos($query,":"winking smiley;
$urlquery = strip_tags(trim(substr($query,$pos+1)));

// Search for URLs that were already indexed.
$res=mysql_query("select * from ".$mysql_table_prefix."sites where url like '%$urlquery%' AND indexdate != ''"winking smiley;
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) { // Nothing found
print "<br><div id =\"result_report\">The site search \"$urlquery\" didn't match any indexed URL</div>";
die('');
}
if ($num_rows > '1') { // Multiple choice
print "<br><br><b><font color=\"red\">Multiple choice. Please select one domain: </font></b><br><br>";
for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url"winking smiley;
$indexdate = mysql_result($res, $i, "indexdate"winking smiley;

?>
<b><?php print $i+1?>.</b>
<a href="./search.php?query=site:<?php print $url2?>&search=1" class="title"><?php print $url2 ?></a><a class="description"><?php print "&nbsp;&nbsp;&nbsp;indexed: $indexdate<br><br>"?></a>
<?
}
die('');
}

// Get all links of this URL.
$site_id = mysql_result($res,"site_id"winking smiley;
$res=mysql_query("select * from ".$mysql_table_prefix."links where site_id like '$site_id'"winking smiley;
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) print "<br><div id =\"result_report\">The search \"$urlquery\" didn't match any indexed links</div>";
if ($num_rows > 0) { // Display header row and all results
$endtime = getmicrotime() - $starttime;
$time = round($endtime*100)/100;
print "<br><div id =\"result_report\">Displaying $num_rows site results for \"$urlquery\" ($time seconds)</div>";

for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url"winking smiley;
$title = mysql_result($res, $i, "title"winking smiley;
$description = mysql_result($res, $i, "description"winking smiley;
$page_size = mysql_result($res, $i, "size"winking smiley;

?>
<b><?php print $i+1?>.</b>
<a href="<?php print $url?>" class="title"> <?php print $title; if (!$title) print $notitle?></a><br/>
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
<div class="url"><?php print $url2?> - <?php print $page_size?> kB<br><br></div>

<?
}
}
die ('');
?>


Happy coding

Tec
dbi
Re: list all pages from a domain
October 18, 2007 08:57PM
Hey, it worked very fine with me! The only problem is that if I list my page it comes ALL THE 28k pages I have! I can do anything to list only default result number per page?
Tec
Re: list all pages from a domain
October 18, 2007 11:07PM
Not jet. Because I was too lazy to implement that. I will try to include it in version 3. Let me have some time. . .

Tec
dbi
Re: list all pages from a domain
October 19, 2007 12:22AM
All right.

I'm not good with PHP but I'll try take a look at code and compare it with others... But I don't hope anything from me with php! lol
Re: list all pages from a domain
October 19, 2007 02:01AM
Thank you, it works; however it is a little bit not good when listed all pages without navigation sad smiley

And i also know why it doesnt work with me at the first place; because of this mod [www.sphider.eu] sad smiley

Edit: one more problem i see that; thelink doesnt work, when you listed all the pages...



Edited 1 time(s). Last edit at 10/19/2007 02:26AM by hawkeye.
Re: list all pages from a domain
October 19, 2007 02:54AM
I just change this <a href="<?php print $url2?>" class="title"> <?php print $title; if (!$title) print $notitle?></a>

change $url to $url2 then it works..thank you...hope you can do the navigation soon
Tec
Re: list all pages from a domain
October 19, 2007 10:18AM
hawkeye:
Thank you for the $url > $url2 indication. By bug. Will be corrected in version 3 which I try to build during weekend.
Once upon a time we had Saturday night fever. Now we do have Sphider . . .
Tec



Edited 1 time(s). Last edit at 10/19/2007 02:39PM by Tec.
Tec
Re: list all pages from a domain
October 22, 2007 12:48PM
Version 3

All features of version 2 remain valid. Advantages over second version:

1. The instruction 'Show 10 / 20 /50 results per page' now is followed.

2. The link-bug ($url should be $url2) is eliminated. Thanks to hawkeye.

3. On request the mod now will create a sitemap.xml file. To initialize it, include an additional 'xml:' statement after the 'site:' into your search query. So, if beside the listing you also want to create a sitemap of that domain, enter as search query:
site:xml:[www.abc.de]
or even the lazy version:
site:xml:abc.de
According to the site url, as registered in Sphiders admin section, the xml files will be individually named like 'sitemap_www.abc.de.xml' and stored in .../admin/ folder. A new invoking will overwrite an existing sitemap.

In order to get this, replace the content of .../include/search_links.php with the following:

<?
$starttime = getmicrotime();
$notitle = "No meta title available for this site";
$nodes = "No meta description available for this site";
$query = strtolower($query);
$pos = strpos($query,":"winking smiley;
$urlquery = strip_tags(trim(substr($query,$pos+1)));
$pos = strpos($urlquery,"ml:"winking smiley;
if ($pos) { // If we should create a sitemap
$makexml = 'ok';
$urlquery = trim(substr($urlquery,4));
}

// Search for URLs that were already indexed.
$res=mysql_query("select * from ".$mysql_table_prefix."sites where url like '%$urlquery%' AND indexdate != ''"winking smiley;
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) { // Nothing found
print "<br><div id =\"result_report\">The site search \"$urlquery\" didn't match any indexed URL</div>";
die('');
}
if ($num_rows > '1') { // Multiple choice
print "<br><br><b><font color=\"red\">Multiple choice. Please select one domain: </font></b><br><br>";
for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url"winking smiley;
$indexdate = mysql_result($res, $i, "indexdate"winking smiley;

?>
<b><?php print $i+1?>.</b>
<a href="./search.php?query=site:<? if ($makexml == 'ok') print 'xml:'; print $url2?>&search=1" class="title"><?php print $url2 ?></a><a class="description"><?php print "&nbsp;&nbsp;&nbsp;indexed: $indexdate<br><br>"?></a>
<?
}
die('');
}

// Get all links of this URL.
$site_id = mysql_result($res,"site_id"winking smiley;
$res=mysql_query("select * from ".$mysql_table_prefix."links where site_id like '$site_id'"winking smiley;
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) print "<br><div id =\"result_report\">The search \"$urlquery\" didn't match any indexed links</div>";
if ($num_rows > 0) {

if ($makexml == 'ok'){
include ("create_sitemap.php"winking smiley ; // Optional create sitemap.php here
}

// Prepare header and all results for listing
$pages = ceil($num_rows / $results_per_page); // Calculate count of required pages
$from = ($start-1) * $results_per_page; // First $num_row of actual page
$to = $num_rows; // Last $num_row of actual page
$rest = $num_rows - $start;
if ($num_rows > $results_per_page) { // Display more then one page?
$rest = $num_rows - $from;
$to = $from + $rest; // $to for last page
if ($rest > $results_per_page) $to = $from + ($results_per_page); // Calculate $num_row of actual page
}

$endtime = getmicrotime() - $starttime;
$time = round($endtime*100)/100;
$fromm = $from+1;

// Display header
print "<br><div id =\"result_report\">Displaying sites $fromm - $to from $num_rows results for \"$urlquery\" ($time seconds)</div>";
// Display actual rows for this result-page
for ($i=$from; $i<$to; $i++) {
$url2 = mysql_result($res, $i, "url"winking smiley;
$title = mysql_result($res, $i, "title"winking smiley;
$description = mysql_result($res, $i, "description"winking smiley;
$page_size = mysql_result($res, $i, "size"winking smiley;

?>
<b><?php print $i+1?>.</b>
<a href="<?php print $url2?>" class="title"> <?php print $title; if (!$title) print $notitle?></a><br/>
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
<div class="url"><?php print $url2?> - <?php print $page_size?> kB<br><br></div>

<?
}
}

if ($pages > 1) { // If we have more than 1 result-page
?>
<div id="other_pages">
<?php print "Result page: <b>$start</b> from $pages&nbsp;&nbsp;&nbsp;&nbsp;"; ?>
<?php
if($start > 1) { // Display 'First'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=1"><?php print "First"?></a>&nbsp;&nbsp&nbsp;
<?php
if ($start > 5 ) { // Display '-5'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=<?print $start-5 ?>"><?php print "- 5"?></a>&nbsp;&nbsp&nbsp;
<?php
}
}
if($start > 1) { // Display 'Previous'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=<?print $start-1 ?>"><?php print "Previous"?></a>&nbsp;&nbsp;&nbsp;
<?php
}
if($rest >= $results_per_page) { // Display 'Next'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=<?print $start+1 ?>"><?php print "Next"?></a>&nbsp;&nbsp;&nbsp;
<?php
if ($pages-$start > 5 ) { // Display '+5'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=<?print $start+5 ?>"><?php print "+ 5"?></a>&nbsp;&nbsp;&nbsp;
<?php
}
}
if($start < $pages) { // Display 'Last'
?>
<a href="./search.php?query=site:<?php print $urlquery?>&search=1&start=<?print $pages ?>"><?php print "Last"?></a>
<?php
}
}
die ('');
?>


Also create a new file in .../include/ folder called create_sitemap.php with the following content:


<?
$changefreq = "monthly"; // Individualize this sitemap.xml variable
$priority = "0.50"; // Individualize this sitemap.xml variable

// Below only change something, if you are sure to remain compatible to [www.sitemaps.org]
$date = date("Y-m-d"winking smiley;
$time = date("h:i:s"winking smiley;
$modtime = "T$time+01:00";
$xmlmax = '50000'; // Max URLs count
$xmlmaxsize = '10485760'; // Max. filesize = 10 MB
$version = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ;
$urlset = "<urlset xmlns=\"[www.sitemaps.org]\" xmlns:xsi=\"[www.w3.org]\" xsi:schemaLocation=\"[www.google.com] [www.sitemaps.org]\">";
$copyright = "<!-- Generated by Sphider with additional module created by Tec (v.1.0 rev.2) -->" ;
$update = "<!-- Last update of this sitemap: $date / $time -->" ;
$all_links = '';
$length = '';
$link_count = '';

$xml_rows = $num_rows;
if ($xml_rows > $xmlmax) $xml_rows = $xmlmax ; // Don't exceed max URLs count
for ($i=0; $i<$xml_rows; $i++) {
$link = htmlentities(mysql_result($res, $i, "url"winking smiley);
if ($length < $xmlmaxsize-1000) { // Don't exceed max. filesize
// Create individual rows for XML-file
$all_links = "$all_links<url><loc>$link</loc><lastmod>$date$modtime</lastmod><changefreq>$changefreq</changefreq><priority>$priority</priority></url>\n";
$length = strlen($all_links);
$link_count = $link_count +1;
}
}

// Create filename and open file
$name = parse_url($urlquery);
$hostname = $name[host];

if ($hostname == 'localhost'){ // If we run a localhost system extract the domain
$pathname = $name[path]; // Get path, domain and filename
$pos = strpos($pathname,"/",1); // Extract domain from path and forget first / by +1 offset
$pathname = substr($pathname,$pos+1); // Suppress /localhost/
$pos = strrpos($pathname,"/"winking smiley;
if ($pos) $pathname = substr($pathname,0,$pos); // If exists, suppress filename and suffix
$filename = "./admin/sitemap_localhost_$pathname.xml";
if (!$handle = fopen($filename, "w"winking smiley) {
print "Unable to open $filename";
die;
}

} else { // If we run in the wild
$filename = "./admin/sitemap_$hostname.xml";
if (!$handle = fopen($filename, "w"winking smiley) {
print "Unable to open $filename";
die ('');
}
}

// Now write all to XML-file
if (!fwrite($handle, "$version\n$urlset\n$copyright\n$update\n$all_links</urlset>\n"winking smiley) {
print "Unable to write to $filename";
die ('');
}
fclose($handle);

// sitemap.xml done! Now final printout
$kblength = ($length / 1000) ;
if ($xml_rows >= $xmlmax) print "<br><br><font color=\"red\"><b>Attention: Max. URL count for sitemap is reached. Sitemap is restricted to $xmlmax URLs from a total amount of $num_rows sites.</b></font><br>";
if ($length >= ($xmlmaxsize-1000)) print "<br><br><font color=\"red\"><b>Attention: Max. filesize for sitemap is reached. Sitemap is restricted to $kblength kB.</b></font><br>";

print "<br>Created: <font color=\"blue\"> $filename</font>&nbsp&nbsp;with $link_count URLs.&nbsp&nbsp;Filesize: $kblength kB<br><br>";
?>


Happy coding

Tec
Re: list all pages from a domain
October 22, 2007 01:04PM
Thank you it is great but i got this error

Warning: mysql_result() [function.mysql-result]: Unable to jump to row -1 on MySQL result index 17 in /localhost/include/search_links.php on line 77
Tec
Re: list all pages from a domain
October 22, 2007 05:09PM
Very strange that error message. Row 77 of search_links.php is only a "print" row.
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
This has nothing to do with mysql.
But the error message is also strange to me because of /localhost/include/...
I would expect something like /localhost/sphider/include/...

I copied back the whole file search_links.php as published on the forum here and it is running well on my system. So there was no copy and paste problem before.

Tec



Edited 2 time(s). Last edit at 10/22/2007 05:14PM by Tec.
Re: list all pages from a domain
October 22, 2007 06:48PM
I didnt create the create_sitemap.php; is that caused this problem?

Thank you
Sorry, only registered users may post in this forum.

Click here to login