Friday, 11 September 2009

Version 1.1 Released

A fortnight after the release of Snow Leopard we have finally finished working through the issues and are very pleased to be able to publish VelOCRaptor v1.1. It is available for immediate download, and should auto-update from previous versions.

For the record, VelOCRaptor had 2 major issues under Snow Leopard.
  • Our PDFs rendered only part of the page - this was due to the 64-bitness of 10.6, and was the real show-stopper
  • When you came to save the PDF, it complained that 'The location of the document <>.pdf cannot be determined' and offered to let you Save As. This was caused by the way we attempted to have one document which started life as an image file and ended up as a PDF file. The bodge to make that work was fine on Leopard, but Snow Leopard turned out more finickity.

If you have been waiting for Snow Leopard support, we're sorry it took so long, and as compensation we're shipping a bonus Automator action. This makes it easy to use VelOCRaptor as part of your workflow - we hope it comes in handy.

Thursday, 27 August 2009

Snow Leopard

Sometimes, no, come to think of it, often, I believe that Apple is actively hostile to developers on its platform. So it is with releasing Snow Leopard early. It's all very well showing off, but this one single act is farting in the direction of the little guys like us. I'm stuck on holiday - Snow Leopard will be released tomorrow, and I simply don't know whether VelOCRaptor will work or not :-(

So if there are problems with VelOCRaptor and your shiny new OS, I'm really sorry. I, in common with most Mac shops I think, had planned to spend September making sure that everything was hunky-dory. I'll get around to fixing it as soon as I can - a lament that I think you're going to hear a lot in the next few days.

Wednesday, 29 July 2009

How Are We Doing?

I had previous posted about our first day's figures. Since the rush caused by the MacInTouch launch, and then a spike when we were on the front page of Apple's download site, things have settled down. I've been busy fixing a couple of key bugs, and enhancing Rococoa in time for Snow Leopard, so there has been very little publicity, and I can now report on our steady-state traffic. These figures are not precise, sometimes they are just gleaned from looking at the Analytics graphs and guestimating, but they are figures, based largely on the 7 days beginning 22 July.

Google Analytics shows that our steady state is 50 visitors a day, with 7 downloads of the application. Looking at the server logs though, Analytics misses many downloads. 10 a day are referred from Apple downloads, 5 a day from MacUpdate, and 17 a day from our own downloads page. I don't know why Analytics misses those last, a very few are people retrying downloads, but most just seem to slip the JavaScript net. In total I think that we get 33 downloads a day.

Analytics also reveals that 40% of our traffic is direct, 10% comes from people searching for the term 'velocraptor', and our bounce rate is 44%. From this I'm forced to conclude that 44% of our traffic is people either typing 'velocraptor' into the location bar and finding us rather than dinosaurs, or a similar effect with Google's 'I'm Feeling Lucky'. If we discount all the bounces as people who should just learn to spell, then our real steady state is 33 visitors a day - this matches the sum of the pukka search terms and referrals, but does not include those people who download the app without touching the html.

Spookily then, our ratio of visits to downloads is 1, although these aren't all the same people! This blows the industry average of 28% to pieces, but it isn't all good news - my next post will cover registrations.

Tuesday, 21 July 2009

Is VelOCRaptor Good Enough?

From UserVoice - "I agree with Dyno wholeheartedly. Not to sound discouraging but it's really of no use in its current state. Even using the crispest font, it doesn't recognise half the number characters. And the PDF output is blurry. It's actually rather cheeky getting users to en masse as beta testers in this way. (More dubious practices to follow, no doubt)."

I don't mean to be defensive, but to say that VelOCRaptor is no use in its current state, and to accuse us of dubious practice by releasing it is a bit harsh. If you don't like the product, by all means don't use it, but please don't question our motives.

I would obviously like for the accuracy to be better, but 1.05 developers are not going to develop a world-beating OCR engine. The companies that have developed OCR engines are charging you $125 (FineReader) to $499 (OmniPage) for them, and their integration and usability is quite frankly substandard. I've tried to licence a world-class engine, but the company won't risk letting the technology ship in a product with the features and price that define VelOCRaptor.

So the OCRopus engine is the best that you can buy for under $100. I wish it was better, but it isn't, yet. I thought long and hard about whether to ship with the current engine and I came to the conclusion that it was better than nothing - which is after all the alternative at this price. We state up front that the accuracy isn't great, and we post an example showing its performance. We've released it as it is because to many people it is good enough to produce searchable PDFs and grab occasional text - would the world be better off if we hadn't?

Some people are delighted with VelOCRaptor, others disappointed, but we're not forcing anyone to buy it, and we can hardly be accused of misrepresenting the performance. Releasing a product is hard work and costly - I've had no income for 6 months now. If we don't charge money then we can't tell if there is a market - and we need to know that there is a market if we are to continue development, adding features that users are asking for, and pulling in new OCRopus releases so that it delights more people.

So whilst I apologise for the lack of performance, I am unapologetic about releasing VelOCRaptor. By releasing early and often we get the chance to see if this proto-bird can fly, and users have something that may be of some use. It's an open secret that, whilst I'd love you to licence VelOCRaptor, the current release will continue to function forever without a licence. The reason for that is that we want you to carry on using it until the engine works well enough. In the meantime, to quote Guy Kawasaki, we have embraced "Don't worry, be crappy"

Tuesday, 14 July 2009

Integration

Spurred by an email enquiry, I've added a page describing the various ways of integrating VelOCRaptor with other programs.

Friday, 10 July 2009

FineReader

Yesterday Abbyy announced the release of ABBYY FineReader Express Edition for Mac. At the risk of promoting our most credible competitor, I've just bought a copy, and its accuracy is very good. They've also obviously worked hard to make it simple to use compared to its previous incarnation. You can't try before you buy, but if you need accuracy and will pay 89 Euros (they won't show me the price in $), it's the one I'd go for at the moment.

Thursday, 9 July 2009

Memory Leak Fixed

After way too long trying to find a solution, VelOCRaptor now deals with multi-hundred page PDF documents, if you have the patience!

The issue turned out to be 2 leaks, one in the code that writes extracts images from PDF files to feed to the engine, and one in the code that writes PDFs from those images once the reading is done. Both were easy to diagnose, but hard to fix, as they were symptoms of bugs in Mac OS rather than my code. But nobody cares - you just want software that works - and now it does work a lot better.

[Edit] You can download the update (it is build 195) by using 'Check For Updates' on the VelOCRaptor menu

Wednesday, 8 July 2009

It's Quiet, Too Quiet

So what's been happening? Well a steady stream of reports of the same bug - we run out of memory if you try to read from PDF documents with hundreds of pages - and some great feedback and suggestions via UserVoice.

I was surprised when people first used VelOCRaptor on large PDF documents, but then I'd had my mind in the world of little scanners, and reckoned without the Internet. So people have been trying to push whole PDF books through the thing, and it breaks. It took very little work to find the source of the memory leak, but fixing it is another issue. Basically the RubyCocoa system we use for the guts of the PDF reading and writing isn't up to the job, and I'm having to re-write much of that code in Objective-C - the language of Mac OS X. It's irritating, but just a fact of programmer life, so I'm biting the bullet and getting on with it. Wish me luck.

Monday, 29 June 2009

Apple Downloads Staff Pick!

We've now been listed on Apple Downloads, and they were kind enough to make us a Staff Pick. As we're one of only 2 OCR products listed, and ReadIris hasn't been updated in over 2 years and doesn't provide a demo download, we are now officially uncloaked and ready to receive the pent-up demand of Mac users for Simple Affordable OCR.

Friday, 26 June 2009

A Promising Start

Thanks to everybody who has downloaded and tried VelOCRaptor, and in particular to those who have paid for a license.

In a spirit of openness I thought that I'd share my first day (25th June)'s results with the world. For the record, it's not clear to me which timezone Google analytics uses, so these figures are for approximately 24 hours, but not necessarily in any particular timezone.

On the first day I publicised the release only through this blog, the release notes on the apps autoupdate, and MacInTouch. MacInTouch was published Thursday morning EDT, and my first licence was shipped at 13:56 GMT+1

In the 24 hours that followed, VelOCRaptor.com had 510 visits from 459 visitors, 418 of them new to the site. 68% of the total was referred by MacInTouch.

MacInTouch was the only site to publish the donation page allowing people to licence by donation, rather than paying full-price. This page was visited by 193 people.

The application itself was downloaded 49 times. [Update 2009-07-29 - the server logs say 241 downloads]

In the 24 hours since the first licence was sent, I have received 45 donations and 2 full-price registrations. The smallest donation was $1, the largest $29. The average of donations and registrations was $9.95.

No doubt things will go quiet now, but again, a big thank you to everyone who's been a part of these numbers.

Thursday, 25 June 2009

VelOCRaptor 1.0 launched!

Like all the best software, VelOCRaptor was released at a little after midnight.

Frankly, apart from the licensing, there has been no change to the app for a couple of weeks now, so I hope that there are no surprises. The only real effect of declaring this build 1.0 is that we can start charging real money, and drumming up a little publicity.

I plan to submit our details to the Apple downloads site tomorrow. I wonder if anyone will be interested? I'm nervous that Apple downloads visitors may be less accommodating than our friendly bunch of early-adopters, but I can't put it off any longer.

Wednesday, 24 June 2009

So near and yet so far

I was all set to release VelOCRaptor 1.0 yesterday, but it was my wedding anniversary, so I didn't quite finish all the tasks before we went out to dinner and a movie (Last Chance Harvey, seeing as you ask).

I was just about to push the big green publicity button this evening when I received an email asking how to donate. As I wrote the reply, my mind went through the steps required to process the licence in the program, and I realised that it doesn't work! The licensing code had a whole bunch of tests, which pass, but relied on a folder which exists on my Mac because I created it by hand when I was prototyping the code, and won't exist on any other Mac.

So I very nearly released version 1.0, charging $29 for a licence that doesn't work.

In the Windows world we solve these problems by installing software into clean virtual machines for testing, but in their wisdom Apple stop developers from doing this for Mac OS. So now I'm off to the iMac to just check that I can install the latest release...

Tuesday, 23 June 2009

Where does the time go?

I've been hard at work trying to integrate the latest version (0.4) of OCRopus with VelOCRaptor. Just compiling it on the Mac has been a challenge, and now that it's running I have to report that it's accuracy is not noticeably better that 0.3. This isn't a surprise, as the new release has not been trained on much character data, but it is disappointing, as I have spent over a week not improving our product.

So, with bills to pay I've decided to release VelOCRaptor 1.0 with OCRopus 0.3. Its accuracy on good quality scans is sufficient for finding finding them again with Spotlight, and for copying sections of text. I expect to be able to release a version with demonstrably better performance before the end of the year, as OCRopus 0.5 promises much.

We're now set up to receive payments via PayPal and automatically send out licences by email. I've edited but not yet published the v1.0 website, and am just preparing the copy for the Apple downloads site. If you want a cheap licence by donation, I'd get in there quick!

Friday, 12 June 2009

I'm no graphic artist but...

...I know what I like. And I think that I like our new homepage. I've removed the news to its own page, replacing the rather naff link to the blog. The grabs on the front page now all zoom pleasingly, giving more information on the features.

It's not perfect, but I do believe that it's fit for purpose, which is of course, to let me release Version 1 next week. Watch this space.

Wednesday, 10 June 2009

More web site updates

I've updated the site with a fresh new look. Please let me know what you think.

Sunday, 7 June 2009

New Screencast

I've just posted an screencast showing how to use VelOCRaptor with the Apple Image Capture application to produce readable PDF's directly from your scanner.

Friday, 5 June 2009

Licensing, Bug Fixes and Tidying

While Simon and I are keen for people to use VelOCRaptor, we're also keen to make some money. So I've been adding licensing code to the application. You can't yet buy a licence, so not having one doesn't disable any part of the program, but we're one step closer to release.

To which end I've been busy on holiday, and since I've come back, fixing some pesky bugs, making the app look a bit nicer and stretching my meagre graphic design talents over the website. Actually I think I may have made this blog look worse - I'll revisit Blogger later.

Friday, 22 May 2009

OCRopus 0.4

Our OCR engine, OCRopus, is working its way towards a 0.4 release. To that end I've spent a day trying to compile the new code on Mac. As far as I can see I'm the first to attempt it, and it's taught me a lot about Unix programming!

It now builds, and works for some of our test files. First impressions are that accuracy is better where it works, so I'll be working hard to integrate the engine with VelOCRaptor over the next week.

Wednesday, 20 May 2009

New Screencast

I've just recorded a new screencast showing the normal interface, and updated our screenshots on the home page.

Window sizing

Now that we're into the GUI polishing stage, I've added code to resize windows to match the size of the document that they are reading. Let me know if you like it, or if it's just annoying.

Auto-update

It's been a very long day, but VelOCRaptor now has auto-update, thanks to the excellent Sparkle framework.

Monday, 18 May 2009

Thick and fast

No, not my mountain bike style, but our builds.

The latest build (155) fixes the app on Mac OS X 10.5.7 It also has a revamped GUI, so that the preview while we are reading matches the PDF display when we are done. I expect this to have problems, as I'm learning my way around this stuff, but architecturally it's close to where I want it to be, so look for more GUI tarting in the next few days.

10.5.7

Pity the poor Mac developer. The recent upgrade to 10.5.7 broke our reading of our own sample file! Apple updated Ruby, and in particular its XML parser, which seems to have caused some issue. I'm looking into it, and should have a new version out today.

Friday, 15 May 2009

New release

I've just uploaded a new release. To be honest its visually a bit worse than the last, but it does rationalise our handling of files dropped on the closed application with those opened with the File menu. The windows and documents we display are now the same in both cases, which means that we can reuse the code and introduce fewer bugs. Please bear with me while I tidy up the visuals and the workflow, and let me know what you think of the new windows.

Thursday, 14 May 2009

What's the latest?

Whilst this blog may have been quiet, it's because things are very busy here in VelOCRaptor towers.

I've been working on creating a lot of tests for the app. They have revealed some bugs that we have fixed, and have let me rework our internals so that the mini-mode (drop on the closed app, it converts then exits) and standard mode (open the app, use file open or the big drop target) are now using the same code.

The tests have also allowed me to experiment with replacing our OCR engine (OCROpus) with another commercial engine. I can't say much more yet, but if negotiations are fruitful we should be able to release a product with world-beating usability, and class-leading accuracy. What's not to like?

Tuesday, 5 May 2009

Bug fix release

I've just uploaded build 150. This fixes crashes when closing windows, and an error with some images.

Wednesday, 29 April 2009

Multi-page PDF support, Maxi-mode interface

I'm delighted to announce that our new build, 142, now supports reading from multi-page PDFs. This means that if you have a sheet-feeder, we can process whole reams of pages at once. Actually I haven't tried a whole ream, and suspect that we might run out of memory, but give it a go anyway.

Our maxi (non-mini) mode has also had a makeover, with a nice drop target, a save button to give you a clue what to do when it's finished reading, and a bunch of little bug fixes.

If you've the inclination, please download VelOCRaptor and try it out.

Friday, 3 April 2009

Improved layout

If you've tried VelOCRaptor, you'll have found that it really didn't do a great job with lining its text over the right bit of the image. This is because I wrote each line where it should be, but I really don't know what the font size is, so that if we have it wrong, the characters get progressively out of sync.

I've improved this quite a bit by printing word by word rather than line by line. It makes the selection look a little wonky at times, but should improve your ability to select text, especially in multi-column layouts.

Wednesday, 1 April 2009

Invisible progress

With a family funeral on Friday it's been a short week, but progress is still being made. I've smartened up the build process, so that I can reliably update the app and its download file, and improved the error reporting when things do go wrong, so that we know what build it was in and what caused the problem.

These are the things that larger software projects can leave until close to that big 1.0 release, but when you're releasing more often they become more important.

Tuesday, 24 March 2009

PPC and Improved Accuracy

It's been almost a week since my last confession, but we've been hard at work. Simon has added a nice little preview to the mini-mode interface, and compiled Tesseract and OCRopus for PPC, so we now have a Universal Binary!

Meanwhile I've been post-processing the text by replacing mis-spelt words. This uses the Mac spellchecker, so it should pick up your custom words. I'm not entirely convinced that it improves overall performance that much, but it does make the output text a lot more plausible.

Wednesday, 18 March 2009

PDF writing with Quartz

Up to now I've been writing PDFs using XSL:FO and Apache FOP. This was the path of least resistance, but did mean shipping 11Mb of FOP, and shelling out to Java to do the work.

It's been painful, but I've now replaced that code with native Mac Quartz code to write the PDF. So we should write PDFs a lot quicker (still dwarfed by the OCR time mind), and our download is now on 3.3Mb zipped.

Saturday, 14 March 2009

Reading from PDF files

You asked for it, you got it. We now read images from PDF files as well as JPEG, PNG, and TIFF. We are currently limited to rendering the first page and reading that, but I think that should cover the vital 90%.

As a bonus it has led to the removal of ImageMagick in favour of SIPS, which can read PDF all by itself, and is built into Leopard. So we've just lost 30Mb!

Friday, 13 March 2009

205 Downloads

According to my server logs we've had 205 downloads of VelOCRaptor (that weren't me checking its OK).

Come on people - you've had a play, where's the feedback?

PDF reading

Due to the magic of SIPS I should be able to remove ImageMagick and support PDF reading in one fell swoop. I think I'll wait until I'm less tired before I commit though.

Thursday, 12 March 2009

PDF reading support

I've had other things to do today, but from the response so far it's clear that we need to add reading from PDF pretty quickly.

In the meantime Simon has at least set the app to reject dropped pdfs, so we won't be popping up nasty error dialogs.

Wednesday, 11 March 2009

MacInTouch

The good folks at MacInTouch gave us a mention - leading to 137 visits so far today.

Welcome MacInTouchers, be sure to let us know what you think.

Google activity

A full week after letting it know we exist, the Google machine has swung into action and found us. So I'm trying to cope with a whole few people trying VelOCRaptor.

Thanks to those who have downloaded the app and tried it out. Don't forget to vote for your itch to be scratched - at the moment I can see that we really do need to support reading PDFs, so I'm going to work on that next.

EDIT - Ah, looking at the logs, it's clear that MacInTouch and not Google are driving the traffic.

New release

I'm just rsyncing a new release. This should run about twice as fast as the last, by dint of not OCRing twice ;-)

Accuracy Results - SA-tax.jpg - revised

Embarrassingly I built (and released) a version that invoked ocroscript twice, throwing away the first results.

So while the accuracy results are unchanged - the times should be quicker.


$ src/script/velocraptor.rb testdata/SA-tax.jpg out.txt NORMALIZE_PROCESSOR; src/test/spell.rb out.txt
I, [2009-03-11T17:55:29.694300 #15844] INFO -- : Converting testdata/SA-tax.jpg to out.txt
I, [2009-03-11T17:55:49.983578 #15844] INFO -- : Times: CPU 19.93 Elapsed 20.2892169952393
57 unknown from 504 words = 11.3095238095238%


$ src/script/velocraptor.rb testdata/SA-tax.jpg out.txt CONVERT_PROCESSOR; src/test/spell.rb out.txt
I, [2009-03-11T17:58:00.720594 #15854] INFO -- : Converting testdata/SA-tax.jpg to out.txt
I, [2009-03-11T17:58:19.683518 #15854] INFO -- : Times: CPU 18.79 Elapsed 18.9628579616547
54 unknown from 506 words = 10.6719367588933%

Accuracy Results - SA-tax.jpg

Plain image
I, [2009-03-11T12:49:55.422169 #12001] INFO -- : Converting testdata/SA-tax.jpg to plain.txt
I, [2009-03-11T12:50:30.199773 #12001] INFO -- : Times: CPU 32.14 Elapsed 34.7775390148163
45 unknown from 500 words = 9.0%


Normalized
I, [2009-03-11T13:11:25.775800 #12092] INFO -- : Converting testdata/SA-tax.jpg to normalized.txt
I, [2009-03-11T13:12:02.039675 #12092] INFO -- : Times: CPU 35.58 Elapsed 36.2637679576874
57 unknown from 504 words = 11.3095238095238%

Recognition Accuracy

I've been working up a way of judging the accuracy of OCR. My simplistic approach is to assume that if a word is spelled correctly, it is correct. So make a set of each unique word, remove those which are actually words, and report the ratio of mis-spells to total words.

I'll report our results here soon.

Better Recognition

I've spent this afternoon working out how to distribute ImageMagick with VelOCRaptor so that we can pre-process the images to improve accuracy.

The latest version now uses histogram normalization to improve the image contrast prior to scanning. I'm now looking into the best way of measuring accuracy.

Tuesday, 10 March 2009

New Screencast

I've just uploaded a screencast showing the new mini-mode GUI. I'm trying YouTube this time, as it will convert my mov capture on the fly. The movie quality is worse though.

Monday, 9 March 2009

Cocoa GUI

I'm just in the process of uploading our Cocoa GUI for the first time. Up to now the download has been of my AppleScript droplet, but Simon has done some fantastic work this weekend so that we now have a genuine Mac front end.

If you drop a file onto the VelOCRaptor icon it behaves as it used to - writing a PDF in the current directory and then exiting, although now with a progress spinner and cancel button. We're calling this mini-mode.

If you open the app normally it offers a large drop target for your images (or File/Open). Drop one there and it is converted - once it's done you can select and drag the text out of it, or Save the PDF.

We've lots of polishing to do, but we now have the 2 basic workflows:
  • drop, convert, exit
  • open, convert, save
Also planned are AppleScript and Automator support.

Sunday, 8 March 2009

Samples online

I've posted my killer jpeg - a multi-column colour government monster form, with our output, on our samples page.

Saturday, 7 March 2009

Adwords results

As a little experiment I signed up with adwords and placed a listing, just in the UK. Google rejected my first ad for trademark infringement, so the revised ad ran for about 9 hours, from around midday. The search terms I used were free mac ocr pdf.

In that time the ad was shown 3,445 times, and got 3 clicks, all related to the search term 'free'. Looking at analytics it seems that these clicks were actually searching for 'free games' - obviously VelOCRaptor is such an attractive title that the respondants ignored the words in the ad.

What surprised me is that Google doesn't seem prefer to run the ad when more than one of the terms is matched. In fact, in all the times I've searched for 'free mac ocr pdf' it's never been shown to me. Given this, 'free' and 'mac' are rubbish keywords, as they trigger (infrequently, as they must be popular) when someone is looking for 'free holiday' or 'mac games'.

So I've changed my strategy and am looking at phrases. I now use 'image to text' 'mac ocr' and 'ocr to pdf'. I figure that these will be shown less frequently, but with far better targetting.

Friday, 6 March 2009

Mac Mac Mac Mac Mac (TM)

Our Google adwords advert was rejected because it has the word "Mac".

This leaves a dilemma - I don't want users costing click-through cash only to find we don't support their platform, but I can't say Mac, and I only have 2 x 35 char lines.

Finally plumped for spending precious characters on "computers rhyming with Nac" Thanks for your help Apple.

iWeb woes

Spent the day replacing the huge mess that was iWeb's published site with hand-crafted xhtml/css.
I was already having to process iWeb's output with Ruby to add UserVoice and Google analytics scripts, and I somehow broke iWeb's page navigation bar when I added a link to Blogger.

So after 3 hours on the bike at lunchtime its been a happy few hours working out how to centre pages and highlight the current page in CSS.

Please let me know if it doesn't work in your browser.