Cooking, for some a chore, for some absolute joy. I’m somewhere in the middle. But over the years I’ve learned that if I need to plan my meals. If I plan a week of meals in advance we can do groceries for the entire week in one go and by thinking about your meals in advance you can vary your meals for nutritional value. I used to have a very strict diet to prevent stomach aches, and planning and cooking those meals was annoying, but eating the same things is very boring. [Read More]
Introducing the 'Smoll Data Stack'
The Small, Minimal, Open, Low effort, Low power (SMOLL) datastack1, is a pun with ambitions to grow into something larger, and more educational. I wanted a cheap platform to work on improving my data engineering skills and so I re-purposed some hardware for this project. A raspberry pi 3 with ubuntu & a NAS that I’ve installed a postgres database into. What people call the ‘modern data stack’ is usually  a cloud data warehouse such as Snowflake, Bigquery or Redshift2 (In my opinion a data warehouse is something that holds all the data in table like format and your transformations are done with SQL). [Read More]
Don't Panic! a Scientific Approach to Debugging Production Failure
Your production system just broke down. What should you do now? Can you imagine your shiny application / flask app, or your API service breaking down? As a beginning programmer, or operations (or devops) person it can be overwhelming to deal with logs, messages, metrics and other possible relevant information that is coming at you at such a point. And when something fails you want it to get back to working state as fast as possible. [Read More]
WTF is Kubernetes and Should I Care as R User?
Fearless to production
I’m going to give you a high overview of kubernetes and how you can make your R work shine in kubernetes. Are you, an R-user in a company that uses kubernetes? building R applications (models that do predictions, shiny applications, APIs)? curious about this whole kubernetes thing that your coworkers are talking about? somewhat afraid? Then I have the post for you! Many R users come from an academic background, statistics and social sciences. [Read More]
How I Set Up Dagster in a Company
In the past few months I setup a dagster deployment within a kubernetes cluster. This was a lot of fun because I learned a lot, but I’d like to document some of the things we did so I’ll remember them later. Dagster is a scheduler/ orchestrator or workflow manager (I’ve seen all those words but I’m not sure what the differences are). When you need to get data from one place to another, do complex data operations that need to happen in a certain order or you have many many tasks to run, you might want to use such a thing. [Read More]
The Whole Game; a Development Workflow
Developing software together
This is a post for people who only work alone or wonder why on earth you would use all those fancy tools like linting, unit-tests, and fancy editors. I hear you, why would I use all those extra steps? That sounds like busywork you do instead of actual work! I think you just don’t haven’t experienced development work like I have, and I would like to share how my work feels and looked like in the past few years. [Read More]
Data Science Technical Terms: Job Titles and Fields
MLE, AE, DE, DS, WTF?
What do I mean when I talk about MLops, Machine Learning Engineering, or data science? I call myself data engineer, data scientist, or machine learning engineer. But never an analyst. To me these job-titles all have a certain meaning, although they overlap. Here is what the job titles mean to me, right now. The first thing you need to keep in mind is the size of the organization the size of the data team and the data-maturity of an organization. [Read More]
Not the Jobtitle but the Activities
What exactly would you say you do here?
I call myself data engineer, data scientist, or machine learning engineer. But never an analist. To me these job-titles all have a certain meaning, but how would a recruiter know what these things mean? My understanding of the roles can also be different from someone else in the field. Some people (they are dicks) would like to keep everyone who is not building neural nets out of the title of data scientist. [Read More]
Should I Move to a Database?
Long ago at a real-life meetup (remember those?), I received a t-shirt which said: “biggeR than R”. I think it was by microsoft, who develop a special version of R with automatic parallel work. Anyways, I was thinking about bigness (is that a word? it is now!) of your data. Is your data becoming to big? big data stupid gif Your dataset becomes so big and unwieldy that operations take a long time. [Read More]
Distributing data science products
Where or what is production? What does it mean when someone says to bring some data science product ‘in production’ ? What does it mean for data science products to be in production? Is your product already in production? Is it a magical place? I think two questions are of importance: does my ‘thing’ provide value? is my work repeatable? If the answer to these questions is yes, than your ‘thing’ is in production. [Read More]