Your production system just broke down. What should you do now? Can you imagine your shiny application / flask app, or your API service breaking down?
As a beginning programmer, or operations (or devops) person it can be overwhelming to deal with logs, messages, metrics and other possible relevant information that is coming at you at such a point.
And when something fails you want it to get back to working state as fast as possible.
[Read More]
WTF is Kubernetes and Should I Care as R User?
Fearless to production
I’m going to give you a high overview of kubernetes and how you can make your R work shine in kubernetes.
Are you,
an R-user in a company that uses kubernetes? building R applications (models that do predictions, shiny applications, APIs)? curious about this whole kubernetes thing that your coworkers are talking about? somewhat afraid? Then I have the post for you!
Many R users come from an academic background, statistics and social sciences.
[Read More]
How I Set Up Dagster in a Company
In the past few months I setup a dagster deployment within a kubernetes cluster. This was a lot of fun because I learned a lot, but I’d like to document some of the things we did so I’ll remember them later.
Dagster is a scheduler/ orchestrator or workflow manager (I’ve seen all those words but I’m not sure what the differences are). When you need to get data from one place to another, do complex data operations that need to happen in a certain order or you have many many tasks to run, you might want to use such a thing.
[Read More]
The Whole Game; a Development Workflow
Developing software together
This is a post for people who only work alone or wonder why on earth you would use all those fancy tools like linting, unit-tests, and fancy editors. I hear you, why would I use all those extra steps? That sounds like busywork you do instead of actual work!
I think you just don’t haven’t experienced development work like I have, and I would like to share how my work feels and looked like in the past few years.
[Read More]
Data Science Technical Terms: Job Titles and Fields
MLE, AE, DE, DS, WTF?

What do I mean when I talk about MLops, Machine Learning Engineering, or data science? I call myself data engineer, data scientist, or machine learning engineer. But never an analyst. To me these job-titles all have a certain meaning, although they overlap.
Here is what the job titles mean to me, right now.
The first thing you need to keep in mind is the size of the organization the size of the data team and the data-maturity of an organization.
[Read More]
Not the Jobtitle but the Activities
What exactly would you say you do here?
I call myself data engineer, data scientist, or machine learning engineer. But never an analist. To me these job-titles all have a certain meaning, but how would a recruiter know what these things mean? My understanding of the roles can also be different from someone else in the field. Some people (they are dicks) would like to keep everyone who is not building neural nets out of the title of data scientist.
[Read More]
Should I Move to a Database?
Long ago at a real-life meetup (remember those?), I received a t-shirt which said: “biggeR than R”. I think it was by microsoft, who develop a special version of R with automatic parallel work. Anyways, I was thinking about bigness (is that a word? it is now!) of your data. Is your data becoming to big?
big data stupid gif
Your dataset becomes so big and unwieldy that operations take a long time.
[Read More]
Distributing data science products
Where or what is production? What does it mean when someone says to bring some data science product ‘in production’ ? What does it mean for data science products to be in production? Is your product already in production? Is it a magical place?
I think two questions are of importance:
does my ‘thing’ provide value? is my work repeatable? If the answer to these questions is yes, than your ‘thing’ is in production.
[Read More]
UseR2021: Integrating R into Production
A view on UseR 2021
This year’s useR was completely online, and I watched many of the talks. I believe the videos will be public in the future but there were some talks that I wanted to highlight.
I think that the biggest problem with machine learning- (or even data-) projects is the integration with existing systems. Many machine learning products are batch or real-time predictions. For those predictions to make value you will need:
[Read More]
Walkthrough UbiOps and Tidymodels
From python cookbook to R {recipes}
In this walkthrough I modified a tutorial from the UbiOps cookbook ‘Python Scikit learn and UbiOps’, but I replaced everything python with R. So in stead of scikitlearn I’m using {tidymodels}, and where python uses a requirement.txt, I will use {renv}. So in a way I’m going from python cookbook to {recipes} in R! Components of the pipeline The original cookbook (and my rewrite too) has three components:
[Read More]