I was listening to episode 135 of ‘Not so standard deviations’ - Moderate confidence The hosts, Hilary and Roger talked about when to use tidymodels packages and when not. Here are my 2 cents for when I think it makes sense to use these packages and when not:
When not you are always using GLM models. (they are very flexible!) it makes no sense to me to go for the extra {parsnip} layer if you are always using the same models.
[Read More]
Import Ethics as E
import pandas as pd from sklearn import linear_model import Ethics as E Ethics and fairness do not come after you’ve imported scikitlearn, but it is often talked about in that way.
I mean it’s good that we’re thinking about ethics when we start using more advanced models, but I don’t agree with the point in time we start talking about it! We should think about the consequences of our automated decision makers way earlier!
[Read More]
Tidymodels on UbiOps
I’ve been working with UbiOps lately, a service that runs your data science models as a service. They have recently started supporting R next to python! So let’s see if we can deploy a tidymodels model to UbiOps! I am not going to tell you a lot about UbiOps, that is for another post. I presume you know what it is, you know what tidymodels means for R and you want to combine these things.
[Read More]
Some Thoughts About dbt for Data Engineering
Over the last week I have experimented with dbt (data built tool), a cmdline tool created by Fishtown-analytics. I’m hardly the first to write or talk about it (see all the references at the bottom of this piece). But I just want to record my thoughts at this point in time.
What is it Imagine the following situation: you have a data warehouse where all your data lives. You as a data engineer support tens to hundreds of analysts who build dashboards and reports on top of that source data.
[Read More]
Deploy to Shinyapps.io from Github Actions
Last week I spend a few hours figuring out how to auto deploy a shiny app on 2 apps on shinyapps.io from github.
You can see the result on this github repository. This github repository is connected to two shiny apps on shinyapps.io.
Here is what I envisioned,
every new commit to the main branch will be published to the main app. We could then lock down the main branch so that no one can directly commit to main.
[Read More]
Running an R Script on a Schedule: Azure Functions (Serverless)
timer-trigger in Azure Functions
In this post I will show how I run an R script on a schedule, by making use of ‘serverless’ computing service on the Microsoft Cloud called Azure Functions.
In short I will use a custom docker container, install required software, install required r-packages using {renv} and deploy it in the Azure cloud. I program the process in azure such that the it runs once a day without any supervision.
[Read More]
Testing Azure Functions Locally with Azurite
Supplying secrets and simulating storage
I’ve been developing Azure Functions with R for the past week. There are some nice basic tutorials to run custom code on ‘Functions’, the basic tutorials all create a simple web app. That is, the docker container responds to http triggers. However, if you want to use a different trigger, you need to have a storage account too. There are two ways to do this:
use the actual storage account you created on azure simulate storage with the ‘azurite’ container.
[Read More]
TIL: Vectorization in Advent of Code Day 15
Indexing vectors is super fast!
I spend a lot of time yesterday on day 15 of advent of code (I’m three days behind I think). Advent of code is a nice way to practice your programming skills, and even though I think of myself as an advanced R programmer I learned something yesterday!
The challenge is this:
While you wait for your flight, you decide to check in with the Elves back at the North Pole.
[Read More]
Stability, Portability and Flexibility Trade-offs
I think a lot about moving single R scripts from someone’s computer to the cloud (another computer). One of the major questions you need to answer is:
Can I give my solution to someone else in a way that it ‘just’ works?
R is an high level language. This allows you to write out the steps you want to take and that the actual implementation is hidden (can you imagine writing all the steps your computer needs to take?
[Read More]
Rectangling (Social) Network Data, Advanced Options
Link features, for link prediction
This walkthrough is a follow up on my previous post about rectangling network data As a recap: we want to predict links between nodes in a graph by using features of the vertices. In the previous post I showed how to load flat files into a graph structure with {tidygraph}, how to select positive and negative examples, and I extracted some node features.
Because we want to predict if a link between two nodes is probable, we can use the node features, but there is also some other information about the edges in the graph that we cannot get out with node features only procedure.
[Read More]