Posts

Remembering the helpIT Legacy

 

“You’ve come a long way, Baby”: Remembering the world’s first stored program computer

Last Friday was the 65th anniversary of the first successful execution of the world’s first software program and it was great to see the occasion marked by a post and specially commissioned video on Google’s official blog, complete with an interview earlier this month with my father, Geoff Tootill. The Manchester Small-Scale Experimental Machine (SSEM), nicknamed Baby, was the world’s first stored-program computer i.e. the first computer that you could program for different tasks without rewiring or physical reconfiguration. The program was a routine to determine the highest proper factor of any number. Of course, because nobody had written one before, the word “program” wasn’t used to describe it and “software” was a term that nobody had coined. The SSEM was designed by the team of Frederic C. Williams, Tom Kilburn and Geoff Tootill, and ran its first program on 21st June 1948.

Geoff Tootill Notebook

I have heard first hand my father’s stories about being keen to work winter overtime as it was during post-war coal rationing and the SSEM generated so much heat that it was much the cosiest place to be! Also, his habit of keeping one hand in his pocket when touching any of the equipment to prevent electric shocks. Before going to work on the Manchester machine, my father worked on wartime development and commissioning of radar, which he says was the most responsible job he ever had (at the age of just 21), despite his work at Manchester and (in the 60’s) as Head of Operations at the European Space Research Organisation. Although he is primarily an engineer, a hardware man, my father graduated in Mathematics from Cambridge University and had all the attributes to make an excellent programmer. I like to think that my interest in and aptitude for software stemmed from him in both nature and nurture – although aptitude for hardware and electronics didn’t seem to rub off on me. He was extremely interested in the software that I initially wrote for fuzzy matching of names and addresses as it appealed to him both as a computer scientist and as a linguist. My father then went on to design the uniquely effective phonetic algorithm, soundIT, which powers much of the fuzzy matching in helpIT’s software today, as I have written about in my blog post on the development of our phonetic routine.

The Manchester computing pioneers have not had enough recognition previously, and I’m delighted that Google has paid tribute to my father and his colleagues for their contribution to the modern software era – and to be able to acknowledge my father’s place in the evolution of our company.

Additional Resources:

Phonetic Matching Matters!

by Steve Tootill (Tootle, Toothill, Tutil, Tootil, Tootal)

In a recent blog entry, Any Advance on Soundex?, I promised to describe our phonetic algorithm, soundIT. To recap, here’s what we think a phonetic algorithm for contact data matching should do:

  • Produce phonetic codes that represent typical pronunciations
  • Focus on “proper names” and not consider other words
  • Be loose enough to allow for regional differences in pronunciation but not so loose as to equate names that sound completely different.

We don’t think it should also try and address errors that arise from keying or reading errors and inconsistencies, as that is best done by other algorithms focused on those types of issues.

To design our algorithm, I decided to keep it in the family: my father Geoff Tootill is a linguist, classics scholar and computer pioneer, who played a leading role in development of the Manchester Small-Scale Experimental Machine in 1947-48, popularly known now as the “Baby” – the first computer that stored programs in electronic memory

The first program stored in electronic memory

Geoff was an obvious choice to grapple with the problem of how to design a program that understands pronunciation… We called the resultant algorithm “soundIT”.

So, how does it work?

soundIT derives phonetic codes that represent typical pronunciation of names. It takes account of vowel sounds and determines the stressed syllable in the name. This means that “Batten” and “Batton” sound the same according to soundIT, as the different letters fall in the unstressed syllable, whilst “Batton” and “Button” sound different, as it is the stressed syllable which differs. Clearly, “Batton” and “Button” are a fuzzy match, just not a phonetic match. My name is often misspelled as “Tootle”, “Toothill”, “Tutil”, “Tootil” and “Tootal”, all of which soundIT equates to the correct spelling of “Tootill” – probably why I’m so interested in fuzzy matching of names! Although “Toothill” could be pronounced as “tooth-ill” rather than “toot-hill”, most people treat the “h” as part of “hill” but don’t stress it, hence it sounds like “Tootill”. Another advantage of soundIT is that it can recognize silent consonants – thus it can equate “Shaw” and “Shore”, “Wight” and “White”, “Naughton” and “Norton”, “Porter” and “Porta”, “Moir” and “Moya” (which are all reasonably common last names in the UK and USA).

There are always going to be challenges with representing pronunciation of English names e.g. the city of “Reading” rhymes with “bedding” not “weeding”, to say nothing of the different pronunciations of “ough” represented in “A rough-coated dough-faced ploughboy strode coughing and hiccoughing thoughtfully through the streets of the borough”. Although there are no proper names in this sentence, the challenges of “ough” are represented in place names like “Broughton”, “Poughkeepsie” and “Loughborough”. Fortunately, these challenges only occur in limited numbers and we have found in practice that non-phonetic fuzzy matching techniques, together with matching on other data for a contact or company, allow for the occasional ambiguity in pronunciation of names and places. These exceptions don’t negate the need for a genuine phonetic algorithm in your data matching arsenal.

We implemented soundIT within our dedupe package (matchIT) fairly easily and then proceeded to feed through vast quantities of data to identify any weaknesses and improvements required. soundIT proved very successful in its initial market in the UK and then in the USA. There are algorithms that focus on other languages such as Beider-Morse Phonetic Matching for Germanic and Slavic languages, but as helpIT systems market focus is on English and Pan-European data, we developed a generic form of soundIT for European languages. We also use a looser version of the algorithm for identifying candidate matches than we do for actually allocating similarity scores.

Of course, American English pronunciation of names can be subtly different – a point that was brought home to us when an American customer passed on the comment from one of his team “Does Shaw really sound like Shore?” As I was reading this in an email, and as I am a Brit, I was confused! I rang a friend in Texas who laughed and explained that I was reading it wrong – he read it back to me in a Texan accent and I must admit, they did sound different! But then he explained to me that if you are from Boston, Shaw and Shore do sound very similar, so he felt that we were quite right to flag them as a potential match.

No program is ever perfect, so we continue to develop and tweak soundIT to this day, but it has stood the test of time remarkably well – apart from Beider-Morse, I till don’t know of another algorithm that takes this truly phonetic approach, let alone as successfully as soundIT has done.

Steve Tootill (stEv tWtyl)