Jon D'Souza-Eva wrote: ↑Sun Jun 09, 2024 10:52 pm
John Saunders wrote: ↑Thu Jun 06, 2024 9:15 pm
Consider digitising it (in my ideal world every chess player would be a dab hand at using a scanner)
Do you have any hints on how to do that, John? I have a perfectly functional flatbed scanner which is OK for my needs (mostly scanning single A4 page documents) but it's not much use at scanning old magazines or books as an open A4-sized magazine or large book doesn't really fit properly. I've got quite a lot of old magazines and scarce books that I'd like to scan in if I could do it at a faster rate than 1 page every 15 minutes.
I have a Canon flatbed scanner which can scan A4 documents. It's a dedicated scanner rather than a three-in-one printer/scanner/copier. I can't tell you the exact model as I'm not at home currently; besides which, it's about 15-20 years old and probably long out of production. Not hugely expensive, maybe 100 quid when I bought it.
It comes with some fairly decent utility software which I sometimes use for photos, but I prefer to use
ABBYY Fine Reader 12 (no doubt other brands are available) for scanning and OCR-ing documents. For BritBase I sometimes create readable PDFs to post online but more often simply copy the text generated by the OCR component into an HTML page. The text then has to be checked for OCR errors, of course, but it's pretty good apart from 'half' characters which it is incapable of getting right, even though the software is supposed to allow you to 'teach' it how to interpret such characters. This is annoying as the material I scan for BritBase tends to be littered with 'half' characters, of course.
I agree, it's not easy or quick scanning pages from particularly fat or tightly-bound volumes, which you have to press down on the platen in order for the text nearest the 'gutter' to scan effectively. But for slimmer or looser A4 material I don't have too many problems. I can certainly do better than 1 scan every 15 minutes. The utility software allows you to set a time gap between scans - say 20-30 seconds - during which you can turn the sheet over or get the next page onto the platen - before the next scan triggers automatically. That way you can get a 40-50 page scan done in a reasonable time (and it doesn't really matter if you miss an iteration - the blank scan can be deleted later). It doesn't matter which way up you scan a page and it can also adjust automatically if it is slightly skewed on the platen. Foolscap pages (e.g. some old SCCU grading lists) are more of a problem and take longer. If you were to scan a copy of BCM, you place the double spread of two pages on the platen at 90 degrees so that they fit and scan: the software is clever enough to divide this into two A5 pages and correct the orientation.
In theory it should be possible to OCR-scan games in algebraic notation from publications and then copy & paste straight into a ChessBase/HIARCS/SCID new game window, but my advice is - don't bother! It works sometimes but not enough to make it quicker/more accurate than manual input. For a start, if the notation has figurines it doesn't work at all because of all the preparatory work you have to do. You have to 'teach' the software the symbols but it never learns them sufficiently well to be worth the bother. Even if the publication being scanned uses KQRBN, it's not much better: for example, you can't trust it not to mix up Re4 and Rc4, particularly if you are scanning from old typewritten, cheaply reproduced bulletins. Absolute non-starter, I'm afraid. Maybe other OCR software might do it, with a newer, snazzier scanner, but not the hardware/software set-up I have.