Archive for June, 2010

reCAPTCHA Helping to Digitize Books

Friday, June 25th, 2010

Professor Luis von Ahn at Thinking Digital

The annual Thinking Digital conference took place at the Sage last month. Two days packed full of innovative talks and great networking opportunities. Seminars covered everything from Creative Commons and citizen journalism to story telling and the impact of sound.

One talk in particular grabbed my attention because it dealt with two subjects close to my heart - websites and books. Professor Luis von Ahn of Carnegie Mellon University took to the stage to talk about reCAPTCHA.

CAPTCHA codes are those pesky codes that you copy when you’re filling in forms online. 200 million of these codes are typed everyday. Their function is to ascertain whether you’re a human being or a bot. Bots and automated programmes can’t read distorted or obscure text. Humans can.

reCAPTCHA Code

Professor von Ahn worked out that it takes an average of ten seconds to type the codes. Hence around 150,000 hours everyday are spent typing them. It was this colossal perceived ‘waste of time’ that led Professor von Ahn and the team at Carnegie Mellon University to come up with reCAPTCHA.

Each time you type a reCAPTCHA code your are helping to digitize books. Here’s how it works. Books printed before the digital age are scanned and made available in digital format using a scanning technology known as OCR. Unfortunately, for many books the print has either deteriorated or is too obscure to be read by a computer. Each of these illegible words is embedded into an image and used as a CAPTCHA code. Therefore, when you’re copying the words in a reCAPTCHA you’re deciphering them for OCR.

But if the computer can’t read these words, how do they know that you have typed them correctly? Here’s how, explained rather succinctly on the reCAPTCHA website:

‘Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.’ Accuracy is estimated at .1 / .2 %.

Wow.

Professor Luis von Ahn Talking About reCAPTCHA

Further Reading:

- www.thinkingdigital.co.uk
- www.captcha.net
- www.cs.cmu.edu/~biglou
- www.vonahn.blogspot.com

Top image courtesy of the Thinking Digital Flickr group.

Twitter Spam

Monday, June 7th, 2010

It seems that every week we see a new spam email campaign being launched.

Today we noticed the first examples of spam purportedly coming from Twitter being received and reported by some of our clients.

Here’s an example we received earlier:

Sample of Twitter spam

It goes without saying you should delete this message immediately and do NOT click on any links in these messages.

We’ll keep this post updated with any other examples we see as they circulate.

We’re Looking for a PHP Developer

Tuesday, June 1st, 2010

Closing Date: Monday 14 June 2010

We’re recruiting again! We’re looking for another PHP Developer to join our team of front and back end developers. Vanilla Storm is a small web design company based in Heaton Chapel, which is just 10 mins by train from Manchester Piccadilly and Stockport. The role is varied and interesting with the opportunity to work on some high profile sites and learn new technologies.

The successful candidate will work on the following:

  • creating, documenting, maintaining, testing and debugging systems developed in PHP versions 4 and 5
  • building secure e-commerce sites using Vanilla Storm developed applications
  • development of HTML / XHTML and CSS websites, working to W3C and WCAG standards
    uploading websites and applications to the server
  • researching and building sites using existing software and building bespoke applications and modules as required
  • research and implementation of new technologies
  • Knowledge of Drupal and open source platforms would be an advantage.

    Is this you? If so, we would love to hear from you. Please email jobs@vanillastorm.co.uk in the first instance and we will forward a full job description.

    Salary: £22,000 to £28,000 depending on skills and experience

    Strictly no agencies, consultants or jobs banks – please do not phone or email.