High and Low Variance in Data Science Work

Consistency or peaks, pick one

High and Low Variance in Data Science Work
I recently read “High Variance Management” by Sebas Bensu and this made me think about datascience work. First some examples from the post: Some work needs to be consistent, not extraordinary but always very nearly the same. Theatre actors performing multiple shows per week need to deliver their acting in the same way every day. Their work is low variance. Some work needs superb results, results you don’t know if you can reach it but you try it many times and between all of the failures, you might find gold. [Read More]

Are you a Fearless Deployer?

Fast experimentation and confident deployments should be your goal

Are you a Fearless Deployer?
how do you feel when you press the ‘deploy to production’ button? Confident, slightly afraid? I bet many data scientists find it a bit scary. It’s worth it to dig a bit deeper into this fear. In my ideal world we are not scared at all. We have a devops mindset. We have no anxiety, no fears at all. You should be confident that the deployment pipeline takes care of everything. [Read More]

Do you Need a Feature Store?

From simple to advanced

A feature store is a central place where you get your (transformed) training and prediction data from. But do you need this? Why would you invest (engineering effort) in a feature store?1 All engineering is making trade offs, a feature store is an abstraction that can lead to more consistency between teams and between projects. A feature store is not useful for a single data scientist for a single project. It becomes useful when you do multiple projects, with multiple teams. [Read More]

Reading in your training data

Data Ingestion Patterns for ML

How do you get your training data into your model? Most tutorials and kaggle notebooks begin with reading of csv data, but in this post I hope I can convince you to do better. I think you should spend as much time as possible in the feature engineering and modelling part of your ML project, and as little time as possible on getting the data from somewhere to your machine. [Read More]

Data Science Technical Terms: Job Titles and Fields

MLE, AE, DE, DS, WTF?

Data Science Technical Terms: Job Titles and Fields
What do I mean when I talk about MLops, Machine Learning Engineering, or data science? I call myself data engineer, data scientist, or machine learning engineer. But never an analyst. To me these job-titles all have a certain meaning, although they overlap. Here is what the job titles mean to me, right now. The first thing you need to keep in mind is the size of the organization the size of the data team and the data-maturity of an organization. [Read More]

Not the Jobtitle but the Activities

What exactly would you say you do here?

I call myself data engineer, data scientist, or machine learning engineer. But never an analist. To me these job-titles all have a certain meaning, but how would a recruiter know what these things mean? My understanding of the roles can also be different from someone else in the field. Some people (they are dicks) would like to keep everyone who is not building neural nets out of the title of data scientist. [Read More]

UseR2021: Integrating R into Production

A view on UseR 2021

This year’s useR was completely online, and I watched many of the talks. I believe the videos will be public in the future but there were some talks that I wanted to highlight. I think that the biggest problem with machine learning- (or even data-) projects is the integration with existing systems. Many machine learning products are batch or real-time predictions. For those predictions to make value you will need: [Read More]