Predicting links for network data

NETWORKS, PREDICT EDGES Can we predict if two nodes in the graph are connected or not? But let’s make it very practical: Let’s say you work in a social media company and your boss asks you to create a model to predict who will be friends, so you can feed those recommendations back to the website and serve those to users. You are tasked to create a model that predicts, once a day for all users, who is likely to connect to whom. [Read More]

Rectangling (Social) Network Data

Preparing data for link prediction

In this tutorial I will show you how we go from network data to a rectangular format that is suited for machine learning. Many things in the world are graphs (networks). For instance: real-life friendships, business interactions, links between websites and (digital) social networks. I find graphs (the formal name for networks) fascinating, and because I am also interested in machine learning and data engineering, the question naturally becomes: How do I get (social) network data into a rectangular structure for ML? [Read More]

Running an R Script on a Schedule: Docker Containers on gitlab

In this tutorial/howto I show you how to run a docker container on a schedule on gitlab. Docker containers are awesome because, once made, they run everywhere! It does not matter what type of computer (Though I believe there is a problem with ARM based vs other CPU’s). you have. Once I build a container you can run my container on a linux box, windows machine or mac. This is also why people love containers for production, you can finally truly pick up a container from development and hand it over to production. [Read More]

Running an R Script on a Schedule: Gh-Actions

Tweeting from github actions

In this tutorial I have an R script that runs every day on github actions. It creates a curve in ggplot2 and posts that picture to twitter. The use case is this: You have a script and it needs to run on a schedule (for instance every day). Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule. [Read More]

Running an R Script on a Schedule: Gitlab

Tweeting from gitlab actions

In this tutorial I have an R script that creates a plot and tweets it, it runs every day on gitlab runners. The use case is this: You have a script and it needs to run on a schedule (for instance every day). Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule. [Read More]

Running an R Script on a Schedule: Heroku

Tweeting from heroku

In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter. The use case is this: You have a script and it needs to run on a schedule (for instance every day). In 2018 I wrote a small post how to run an R script on heroku. The amazing thing is that the bot I created back then is still running! [Read More]

Quick post - detect and fix this ggplot2 antipattern

Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR : Whenever you find yourself adding multiple geom_* to show different groups, reshape your data In software engineering there are things called antipatterns, ways of programming that lead you into potential trouble. This is one of them. I’m not saying it is incorrect, but it might lead you into trouble. [Read More]

Graphing My Daily Phone Use

How many times do I look at my phone? I set up a small program on my phone to count the screen activations and logged to a file. In this post I show what went wrong and how to plot the results. The data I set up a small program on my phone that counts every day how many times I use my phone (to be specific, it counts the times the screen has been activated). [Read More]

interactive ggplot with tooltip using plotly

tldr: wrap ggplotly around ggplot and add info in aes()

A quick Random R thing I use a lot, recently learned, and I want you to know it too. In this post I’ll show you how to make a quick interactive plot with ggplot and plotly, so that values are displayed when you hover your mouse over it. Why would you want this? If you are exploring the data, you want some quick insights into which values are where. [Read More]

Where to live in the Netherlands based on temperature XKCD style

After seeing a plot of best places to live in Spain and the USA based on the weather, I had to chime in and do the same thing for the Netherlands. The idea is simple, determine where you want to live based on your temperature preferences. First the end result: This post explains how to make the plot, to see where I got the data and what procedures I took look at https://github. [Read More]