Creating One Unified Calendar of all Data Science Events in the Netherlands
Over engineering with renv and github actions

Posted on December 2, 2022
| 3 minutes
| 439 words
| Roel M. Hogervorst
I enjoy learning new things about machine learning, and I enjoy meeting like minded people too. That is why I go to meetups and conferences. But not everyone I meet becomes a member of every group. So I keep sending my coworkers new events that I hear about here in the Netherlands. And it is easy to overlook a new event that comes in over email. Me individually cannot scale. So in this post I will walk you through an over engineered solution to make myself unnecessary.
[Read More]Gosset part 2: small sample statistics
Scientific brewing at scale
Posted on October 11, 2019
(Last modified on November 9, 2022)
| 14 minutes
| 2804 words
| Roel M. Hogervorst
Simulation was the key to to achieve world beer dominance.
‘Scientific’ Brewing at scale in the early 1900s Beer bottles cheers
This post is an explainer about the small sample experiments performed by William S. Gosset. This post contains some R code that simulates his simulations1 and the resulting determination of the ideal sample size for inference.
If you brew your own beer, or if you want to know how many samples you need to say something useful about your data, this post is for you.
[Read More]William Sealy Gosset one of the first data scientists
The father of the t-distribution
Posted on August 17, 2019
(Last modified on November 9, 2022)
| 3 minutes
| 600 words
| Roel M. Hogervorst
I think William Sealy Gosset, better known as ‘Student’ is the first data scientist. He used math to solve real world business problems, he worked on experimental design, small sample statistics, quality control, and beer. In fact, I think we should start a fanclub!
And as the first member of that fanclub, I have been to the Guinness brewery to take a picture of Gosset’s only visible legacy there. W. S.
[Read More]Quick post - detect and fix this ggplot2 antipattern
Posted on March 7, 2019
(Last modified on November 9, 2022)
| 6 minutes
| 1168 words
| Roel M. Hogervorst
Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR :
Whenever you find yourself adding multiple geom_* to show different groups, reshape your data
In software engineering there are things called antipatterns, ways of programming that lead you into potential trouble. This is one of them.
I’m not saying it is incorrect, but it might lead you into trouble.
[Read More]interactive ggplot with tooltip using plotly
tldr: wrap ggplotly around ggplot and add info in aes()
Posted on September 13, 2018
(Last modified on November 9, 2022)
| 5 minutes
| 1008 words
| Roel M. Hogervorst
A quick Random R thing I use a lot, recently learned, and I want you to know it too.
In this post I’ll show you how to make a quick interactive plot with ggplot and plotly, so that values are displayed when you hover your mouse over it. Why would you want this? If you are exploring the data, you want some quick insights into which values are where.
[Read More]Use `purrr` to feed four cats
Replacing a for loop with purrr::map_*
Posted on September 10, 2018
(Last modified on November 9, 2022)
| 8 minutes
| 1688 words
| Reigo Hendrikson & Roel M. Hogervorst
Use purrr to feed four cats In this example we will show you how to go from a ‘for loop’ to purrr. Use this as a cheatsheet when you want to replace your for loops.
Imagine having 4 cats. (like this one:)
Four real cats who need food, care and love to live a happy life. They are starting to meow, so it’s time to feed them. Our real life algorithm would be:
[Read More]Arthur blinked, Ford shrugs, but Zaphod leapt; text as graph
Text can be interpreted as a graph
Posted on July 24, 2018
(Last modified on November 9, 2022)
| 16 minutes
| 3206 words
| Roel M. Hogervorst
Can we make the computer say something about characters in a book? In this piece I will search for the names of characters and the words around those names in books. What can we learn about a character from text analysis? Of course it’s also just another excuse for me to read the Hitchhikers series! I will break down the text into chunks of two words, extract the word pairs that matter and visualize the results.
[Read More]Cleaning up and combining data, a dataset for practice
Posted on March 12, 2018
(Last modified on November 9, 2022)
| 3 minutes
| 564 words
| Roel M. Hogervorst
tldr: I created an open dataset for the explicit practice of data munging. Feel free to use it in assignments, but do mention where you got it from (CC-by-4.0). Also unicorns are awesome.
Find the dataset at: https://github.com/RMHogervorst/unicorns_on_unicycles
Data munging / cleaning / engineering At work I was working with a two excel files that were slightly different but could be combined into 1 dataset. This is very typical for day to day cleaning operations that analysts and data scientists do (statisticians too).
[Read More]add abbreviations to your rmarkdown doc
Posted on January 24, 2018
(Last modified on November 9, 2022)
| 2 minutes
| 238 words
| Roel M. Hogervorst
Today a small tip for when you write rmarkdown documents. Add a chunk on top with abbreviations.
In the first chunks you set the options and load the packages. Next create abbreviations, you don’t have to care about the ordering, just put them down as you realize you are creating them.
The first step makes a dataframe (a tibble, rowwise), and the second step orders them.
tribble( ~Abbreviation, ~ Explanation, "CIA", "Central Intelligence Agency", "dplyr", "data.
[Read More]If blogging was like academia, we would all be saved, thank you for your edits.
Posted on December 19, 2017
(Last modified on November 9, 2022)
| 2 minutes
| 291 words
| Roel M. Hogervorst
A month ago I posted a short piece inspired by a post by Maële Salmon She actually reached out to me in 10 minutes, telling me I made a weird spelling error (no excuses, I really make those a lot). Then a day or two later Jon Spring walked through the code and realized that I switched two outcomes in the code.
Just about 10 days ago I posted about downloading multiple files and Mara Avarick noticed a weird ‘«««’ sign on my website.
[Read More]