How I Set Up Dagster in a Company

In the past few months I setup a dagster deployment within a kubernetes cluster. This was a lot of fun because I learned a lot, but I’d like to document some of the things we did so I’ll remember them later. Dagster is a scheduler/ orchestrator or workflow manager (I’ve seen all those words but I’m not sure what the differences are). When you need to get data from one place to another, do complex data operations that need to happen in a certain order or you have many many tasks to run, you might want to use such a thing. [Read More]

Walkthrough UbiOps and Tidymodels

From python cookbook to R {recipes}

In this walkthrough I modified a tutorial from the UbiOps cookbook ‘Python Scikit learn and UbiOps’, but I replaced everything python with R. So in stead of scikitlearn I’m using {tidymodels}, and where python uses a requirement.txt, I will use {renv}. So in a way I’m going from python cookbook to {recipes} in R! Components of the pipeline The original cookbook (and my rewrite too) has three components: [Read More]

Some Thoughts About dbt for Data Engineering

Over the last week I have experimented with dbt (data built tool), a cmdline tool created by Fishtown-analytics. I’m hardly the first to write or talk about it (see all the references at the bottom of this piece). But I just want to record my thoughts at this point in time. What is it Imagine the following situation: you have a data warehouse where all your data lives. You as a data engineer support tens to hundreds of analysts who build dashboards and reports on top of that source data. [Read More]