William Sealy Gosset one of the first data scientists

The father of the t-distribution

I think William Sealy Gosset, better known as ‘Student’ is the first data scientist. He used math to solve real world business problems, he worked on experimental design, small sample statistics, quality control, and beer. In fact, I think we should start a fanclub!

And as the first member of that fanclub, I have been to the Guinness brewery to take a picture of Gosset’s only visible legacy there. Me with a plaque in the Guinness Storehouse commemorating Gosset

W. S. Gosset had qualities that we should emulate. He was smart, resourceful, self-taught in statistics, humble1 and like so many of us data scientists unfortunately, constrained by an NDA from his employer2

Gosset worked in the Guinness brewery in the early 19 hundreds, while the brewery was ramping up its production of beer. They wanted to maintain the same quality and price but produce way more liters of beer.

The Guinness future was in “scientific brewing”—large-scale, industrially con- trolled brewing—wherein all factors of production, from barley breeding to taste testing, are controlled, improved, and confirmed by experimental science. A degree from either Oxford or Cambridge in a natural science was a minimum requirement for a Guinness brewer in the new era – Guinnessometrics - Stephen T. Ziliak

This does sounds a lot like applied statistics, or operational statistics, or… data science! (Although I don’t think you need an advanced degree from a renowned institution to start a data science career)

In his 37 year career William S. Gosset worked on improving the beer consistency and making the brewery more efficient. He made impressive improvements in A/B testing and small sample statistics.

In the next post I will go into William Gossets work on small sample statistics, the result of this work is what we NOW call the t-distribution. (Not the t-test, Gosset didn’t really seem to care about one threshold to rule them all, he very recommended practical boundaries, based on your knowledge and economic considerations)

But why did Gosset publish under the name ‘Student’ and not under his own name?

After a sabattical year working with Karl Pearson, Gosset created a set of tables that came to be known as the t-distribution. And that knowledge was actively used in the brewery. And so Gosset wanted to share that important work with the world. However the managers of Guinness did not want to give away their competitive advantage, if Gosset published a paper describing beer and his name other breweries would copy their work. So as a compromise Gosset published the paper under a pseudonym, and without any mention of a brewery, beer or it’s components.

References


  1. Contemporary statisticians like W. Edward Deming, Udny Yule, and Florence Nightingale David, respectively called him a “very humble and pleasing personality,” “very pleasant chap” and “A nice man […] without a jealous bone in his body.” Both Karl Pearson and R.A. Fisher, the two most famous statistical thinkers of the early 20th Century, who were known to hate each other, found common ground in their fondness for Gosset.[The Guinness Brewer who revolutionized Statistics - Dan Kopf] [return]
  2. The Guinness brewery realized they had a competitive advantage with their small sample tests over other breweries and thus they prevented all scientists in the company from publishing under their own name, mentioning the name Guinness or the word beer. [return]