Running an R Script on a Schedule: Heroku

Tweeting from heroku

In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter. The use case is this: You have a script and it needs to run on a schedule (for instance every day). In 2018 I wrote a small post how to run an R script on heroku. The amazing thing is that the bot I created back then is still running! [Read More]

How to Use Catboost with Tidymodels

Treesnip standardizes everything

So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use CatBoost with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work. This tutorial is extremely similar to my previous post about using lightGBM with Tidymodels. Why tidymodels? [Read More]

How to Use Lightgbm with Tidymodels

Treesnip standardizes everything

So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use lightgbm (LGBM) with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work. Why tidymodels? It is a unified machine learning framework that uses sane defaults, keeps model definitions andimplementation separate and allows you to easily swap models or change parts of the processing. [Read More]

Expressing size in bananas a dive into {vctrs}

Yes I made a stupid package to express lengths in bananas

Recently I’ve become interested in relative sizes of things. Maybe I’m paying more attention to my surroundings since I’m locked at home for so long. Maybe my inner child is finally breaking free. Whatever the reason, I channeled all of that into two packages: everydaysizes A rather unfinished collection of dimensions of everyday objects. banana A package that displays dimensions as … bananas. I’ve collected a bunch of sizes and turned them into ‘units’. [Read More]

New Package, Pinboardr

I’ve created a new package to interact with pinboard not to be confused with pinterest. I noticed there wasn’t a package yet and the API is fairly clear. So come and check it out {pinboardr} at https://github.com/RMHogervorst/pinboardr I did see a new package to interact with pocket: pocketapi. Since pocket is also a kind of bookmark manager I thought there was a need for these kinds of packages. I will leave this package on github for a while, to figure out if I need to make changes and in a month or so I will push it to CRAN. [Read More]

Munging and reordering Polarsteps data

Turning nested lists into a data.frame with purrr

This post is about how to extract data from a json, turn it into a tibble and do some work with the result. I’m working with a download of personal data from polarsteps. A picture of Tokomaru Wharf (New Zealand) I was a month in New Zealand, birthplace of R and home to Hobbits. I logged my travel using the Polarsteps application. The app allows you to upload pictures and write stories about your travels. [Read More]

Where does the output of Rscript go?

stdin, stdout, stderr

We often run R interactively, through Rstudio or in the terminal. But you can also run Rscripts without manual intervention. Using Rscript. But where does the output go? Warning: This post is very linux/unix (macos) centred, I don’t know how this works in Windows. Also I’m using the standard shell in linux ‘bash’ I believe there are some small nuances in the commands in other shells like zsh. Why do I want to know this? [Read More]

Scraping Gdpr Fines

Into the DOM with a flavour of regex

The website Privacy Affairs keeps a list of fines related to GDPR. I heard * that this might be an interesting dataset for TidyTuesdays and so I scraped it. The dataset contains at this moment 250 fines given out for GDPR violations and is last updated (according to the website) on 31 March 2020. All data is from official government sources, such as official reports of national Data Protection Authorities. [Read More]

Gosset part 2: small sample statistics

Scientific brewing at scale

Simulation was the key to to achieve world beer dominance. ‘Scientific’ Brewing at scale in the early 1900s Beer bottles cheers This post is an explainer about the small sample experiments performed by William S. Gosset. This post contains some R code that simulates his simulations1 and the resulting determination of the ideal sample size for inference. If you brew your own beer, or if you want to know how many samples you need to say something useful about your data, this post is for you. [Read More]

Quick post - detect and fix this ggplot2 antipattern

Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR : Whenever you find yourself adding multiple geom_* to show different groups, reshape your data In software engineering there are things called antipatterns, ways of programming that lead you into potential trouble. This is one of them. I’m not saying it is incorrect, but it might lead you into trouble. [Read More]