Help

Quick Start

Just type (or Cut&Paste) the URL for the page you want to validate into the text field on the form and press the "Check now" button.

Introduction

Calling/Linking to the Validator

You can link directly to the Validator home page, or you can call the Validator CGI program. The home page is http://archiveready.com/ at the moment (and for the foreseeable future) and the CGI program can be reached at
http://archiveready.com/check?url=http://yousite.com.

What kind of checks ArchiveReady does?

ArchiveReady is checking several website attributes such as:

  1. Hypertext validity and format (HTML, CSS validation),
  2. Page contents structure,
  3. Compliance with open standards,
  4. Hyperlinks validity,
  5. Robots.txt contents,
  6. Sitemap.xml contents and validity,
  7. RSS feeds structure.

What people say about web archiving?

Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. http://en.wikipedia.org/wiki/Web_archiving

To enable the archive of a site by the Portuguese Web Archive, it is fundamental that the site presents a crawler-friendly homepage. Portuguese Web Archive Crawler

Heritrix (ExtractorJS) has trouble finding the links that are not hardcoded strings in javascript. Heritrix Known Issues

If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site. How Googlebot sees your webpages

The content of your robots.txt file tells search engine crawlers how they should visit your site. Google Webmaster Guidelines

What kind of technologies are used to make ArchiveReady?

ArchiveReady is built using Python and various different libraries such as requests, Beautiful Soup. Nginx and uwsgi are also used. Everything is written in Vim.

Resources

Papers, projects, initiatives, relevant to web preservation.

ARCOMEM Project
ARCOMEM is a EU-funded research project. It is about memory institutions like archives, museums, and libraries in the age of the Social Web.
BlogForever Project
BlogForever will create a software platform capable of aggregating, preserving, managing and disseminating blogs.
International Internet Preservation Consortium
A GLOBAL NETWORK OF EXPERTS ARCHIVING THE WEB FOR FUTURE GENERATIONS.
LiWA Project
Living Web Archives
Memento Project
Memento: Adding time to the web
UK Web Archive Blog: How good is good enough? – Quality Assurance of harvested web resources
Quality Assurance is an important element of web archiving. It refers to the evaluation of harvested web resources which determines whether pre-defined quality standards are being attained.