These levels have been defined by the software carpentry people, and I have modified them to this:

  • beginner: You have just started out in this topic. You do not yet know how things are supposed to work. You do not have a mental model of this thing
  • intermediate: You are a regular user of this software/tool/concept, you have a mental model, but it is not very sophisticated
  • advanced: You have a sophisticated mental model how things work, and you even know when the model breaks, when it does not match reality.

High and Low Variance in Data Science Work

Consistency or peaks, pick one

High and Low Variance in Data Science Work
I recently read “High Variance Management” by Sebas Bensu and this made me think about datascience work. First some examples from the post: Some work needs to be consistent, not extraordinary but always very nearly the same. Theatre actors performing multiple shows per week need to deliver their acting in the same way every day. Their work is low variance. Some work needs superb results, results you don’t know if you can reach it but you try it many times and between all of the failures, you might find gold. [Read More]

Creating One Unified Calendar of all Data Science Events in the Netherlands

Over engineering with renv and github actions

Creating One Unified Calendar of all Data Science Events in the Netherlands
I enjoy learning new things about machine learning, and I enjoy meeting like minded people too. That is why I go to meetups and conferences. But not everyone I meet becomes a member of every group. So I keep sending my coworkers new events that I hear about here in the Netherlands. And it is easy to overlook a new event that comes in over email. Me individually cannot scale. So in this post I will walk you through an over engineered solution to make myself unnecessary. [Read More]

The city, neighborhoods and streets: Organizing your MLproject

reduce your mental load by using conventions

Have you received a project that someone else created and did it make you go 🤯? (Was that someone else: you from a few months back?1 ) Sometimes a project organically grows into a mess of scripts and you don’t know how to make it better. The main problem is often the project organization. I want you to think about, and organize, your project in three levels that I call city-level, neighborhood-level and street-level. [Read More]

Should I Move to a Database?

Long ago at a real-life meetup (remember those?), I received a t-shirt which said: “biggeR than R”. I think it was by microsoft, who develop a special version of R with automatic parallel work. Anyways, I was thinking about bigness (is that a word? it is now!) of your data. Is your data becoming to big? big data stupid gif Your dataset becomes so big and unwieldy that operations take a long time. [Read More]

Munging and reordering Polarsteps data

Turning nested lists into a data.frame with purrr

This post is about how to extract data from a json, turn it into a tibble and do some work with the result. I’m working with a download of personal data from polarsteps. A picture of Tokomaru Wharf (New Zealand) I was a month in New Zealand, birthplace of R and home to Hobbits. I logged my travel using the Polarsteps application. The app allows you to upload pictures and write stories about your travels. [Read More]

Scraping Gdpr Fines

Into the DOM with a flavour of regex

The website Privacy Affairs keeps a list of fines related to GDPR. I heard * that this might be an interesting dataset for TidyTuesdays and so I scraped it. The dataset contains at this moment 250 fines given out for GDPR violations and is last updated (according to the website) on 31 March 2020. All data is from official government sources, such as official reports of national Data Protection Authorities. [Read More]

Setting up CSP on your hugo (+netlify) site

Content security policy is being nice to your readers browser

I recently got a compliment about having a content security policy (CSP) on my blog. But I’m not special, you can have one too! In this post I will show you how I created this policy and how you can too. I’m using the service report-uri.com which automates a lot the work. This is specific for building a hugo site using netlify. I am absolutely no expert and so this is mostly a description of what I did. [Read More]

Graphing My Daily Phone Use

How many times do I look at my phone? I set up a small program on my phone to count the screen activations and logged to a file. In this post I show what went wrong and how to plot the results. The data I set up a small program on my phone that counts every day how many times I use my phone (to be specific, it counts the times the screen has been activated). [Read More]

Logging my phone use with tasker

In this post I’ll show you how I logged my phone use with tasker, in a follow up post I’ll show you how I visualized that. I had a great vacation last week but relaxing in Spain I thought about my use of technology and became a bit concerned with how many times I actually look at my phone. But how many times a day do I actually look at my phone? [Read More]

Running an R script on heroku

Automate alllll the things!

In this post I will show you how to run an R script on heroku every day. This is a continuation of my previous post on tweeting a death from wikidata. Update 2022: heroku is no longer offering free options. Why would I want to run a script on heroku? It is extremely simple, you don’t need to spin up a machine in the cloud on AWS, Google, Azure or Nerdalize. [Read More]