shadow.6174 Posted November 1, 2017 Share Posted November 1, 2017 Oh a fun finding while I crawled the old forums. In there, if you change the _lang_ attribute of `` tag to "pirate" you get a fun surprise in the header xD (you can just do that through Element Inspector, available in any modern browser). Link to comment Share on other sites More sharing options...
Silmar Alech.4305 Posted November 1, 2017 Share Posted November 1, 2017 The old forum contains 4 dynamic views of the content. Therefore, a simple mirroring crawler will create 3-4 copies of the same content. The first view is the normal forum everyone sees if he opens the main page and browses to forums, topics and posts. The second view is the permalink on each post, which performs redirects into the first view. The 3rd view is the view of posts by a member ("See all messages by ..."), which gives a listing of all posts of that member. The 4th view is the many hand-crafted links inserted by members into their posts to reference to another post. For older posts, these links might point to nowhere if the referenced forums were archived or deleted, or are redirected to an archived thread, if still available. If you exclude all but the first view from mirroring to create one static view, most of the links will point to nowhere. Therefore, it is necessary to postprocess each link to represent the first view. I don't know if there is ever a need to put my archive online, since we already have one that is reasonably complete, but it was great fun to figure all this out and create a proper static view of the dynamic content. I also learnt a lot about DOM and simpleXML. Invaluable. After a few tries with pre-made crawlers, I started a customized one especially for this forum. First scan the main page (and a few selected wayback machine-archived main pages) for categories and forums. Second scan the scanned forums for topics. Third scan the scanned topics for posts. Only way to be sure to have a complete copy without holes. Includes building a database with a post#-to-page# assignments, which is required to correctly map permalinks into the static view in postprocessing, or the "next/previous" links found in Arenanet posts that reference posts to other pages. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now