Hi Iain,
well, it's been a while, but the idea is not dead!
I have been wrestling with the file size/DPI question and done a few experimental scans.
Obviously, I don't have access to anything like the professional kit that the Internet Archive at their disposal - they have produced some really impressive PDFs with very good file sizes - 30-40MB and their quality is brilliant - no ghosting or bleeding of content through from the other side of the page, they are awesome, I won't get close, but still . . . to my own plans . . . . . .
Using my, or indeed any scanner, there seems to be a trade off between what's best for text and what's best for colour images. The PCW pages are a mixture of both, so I've tried to use a reasonable scanning density that doesn't do too bad a job of the images but also gives reasonable results when I OCR them.
I suppose another important consideration is what the scans will be used for. If they were meant for reprinting then they would need to be of a much higher quality than for web display. I don't imagine that the pages will ever be reprinted, but still, good quality is nice. The problem is deciding what is a good balance between quality and file size.
I have done a few scans, OCr'ed them and saved them as reduced size PDFs. (The 600 DPI uncompressed file is there as an example, it is not practical to have files at that size on my website!)
They are available here :
http://www.primrosebank.net/pcw/pcw_test.htm
As with many things, I guess that "beauty is in the eye of the beholder". It would be useful if a few folks could have a look and the scans on this page and let me have their opinion, people might have different views on the quality of the images and the relative importance of the accuracy of the text recognition, so all comments are welcome
regards
Dave
regards
Dave