I am attempting to crawl music.myspace.com but they use Tokens in the url.
A token does not stick to the user via a session, but changes on certain page views.
Any idea how to set-up Sphider to ignore the tokens? so that it does not crawl the same page again and again?
Myspace works fine if you request the page without the token.
I have attempted to limit the crawl to only include tokens which start with E, etc. but because the tokens change, the spider only crawls part of the site.
Basically, what I am trying to do is get an index of friendid's for musicians only on Myspace (trying not to get all profile pages).
Any suggestions would be great.
Pete