Friday, 12 February 2010

Version 1.2

It's been very quiet on the VelOCRaptor front recently, as I have had to concentrate on another project to feed my family, and we are still waiting (patiently) for the much-vaunted next release of our OCRopus engine.

In the meantime, I finally got round to adding a checkbox that disables the spell checking. Our first preference!

In order to improve the quality of the output, I run the OCR'd text through the Mac spill chequer, replacing mis-spells with it's top suggestion. This works well for most documents, but can give hilariously bad results on other.

If you're reading a document that is not in the same language as your Mac, or that doesn't really contain words (someone recently sent me DNA sequence data - I hope they didn't need 100% accuracy before starting that gene-therapy) then try turning spell checking off to improve the results.

3 comments:

  1. It is mid-October 2010, and I have just found your ver. 1.2, and trying it out on some 'real-world' text of mine. I 'do' genealogy, and download pages from books.Google.com, which of course you can 'read' but not transfer the text out of. So I was seeing how Velocraptor stacked up against Abbyy Expresss (I am on OS-X 10.6.4 on an iMac) The OCR was pretty poor on both, but Abbyy was about 40% better. In the 'olden days' of OCR, they would let the program 'learn' the alphabet if the typeface was 'unusual.' Some of my Books.Google pages are 'typescript' (you know, a typewriter font) and nothing really reads them very well, even though the individual characters are well separated, clear and distinct. A suggestion for Velocraptor to get higher accuracy - a 'supplemental' program to create a 'font alphabet', so the OCR would know know what it was seeing after it was 'trained.' Except for the accuracy, I like your program very much -- drag/drop files, etc.etc. I hope you can keep developing it.

    ReplyDelete
  2. good to hear your keeping it going. Doing a great job!

    ReplyDelete
  3. no news in over 1 year... anything cooking?

    ReplyDelete