NETWORKS, PREDICT EDGES
Can we predict if two nodes in the graph are connected or not?
But let’s make it very practical:
Let’s say you work in a social media company and your boss asks you to create a model to predict who will be friends, so you can feed those recommendations back to the website and serve those to users.
You are tasked to create a model that predicts, once a day for all users, who is likely to connect to whom.
[Read More]
Rectangling (Social) Network Data
Preparing data for link prediction
In this tutorial I will show you how we go from network data to a rectangular format that is suited for machine learning.
Many things in the world are graphs (networks). For instance: real-life friendships, business interactions, links between websites and (digital) social networks. I find graphs (the formal name for networks) fascinating, and because I am also interested in machine learning and data engineering, the question naturally becomes:
How do I get (social) network data into a rectangular structure for ML?
[Read More]
Running an R Script on a Schedule: Overview
There are lots of rstats tutorials about creating beautiful plots, setting up shiny applications and even a few on setting up plumber APIs (but we could use more). However a lot of work consists of running a script without any interaction.
This is an overview page for the tutorials I’ve created so far. This overview is for you if you want to know how to run your batch script (do one thing without supervision) automatically.
[Read More]
Running an R Script on a Schedule: Docker Containers on gitlab
In this tutorial/howto I show you how to run a docker container on a schedule on gitlab.
Docker containers are awesome because, once made, they run everywhere! It does not matter what type of computer (Though I believe there is a problem with ARM based vs other CPU’s). you have. Once I build a container you can run my container on a linux box, windows machine or mac. This is also why people love containers for production, you can finally truly pick up a container from development and hand it over to production.
[Read More]
Running an R Script on a Schedule: Gh-Actions
Tweeting from github actions
In this tutorial I have an R script that runs every day on github actions. It creates a curve in ggplot2 and posts that picture to twitter.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule.
[Read More]
Running an R Script on a Schedule: Gitlab
Tweeting from gitlab actions
In this tutorial I have an R script that creates a plot and tweets it, it runs every day on gitlab runners.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule.
[Read More]
Running an R Script on a Schedule: Heroku
Tweeting from heroku
In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
In 2018 I wrote a small post how to run an R script on heroku. The amazing thing is that the bot I created back then is still running!
[Read More]
How to Use Catboost with Tidymodels
Treesnip standardizes everything
So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use CatBoost with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work. This tutorial is extremely similar to my previous post about using lightGBM with Tidymodels.
Why tidymodels?
[Read More]
How to Use Lightgbm with Tidymodels
Treesnip standardizes everything
So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use lightgbm (LGBM) with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work.
Why tidymodels? It is a unified machine learning framework that uses sane defaults, keeps model definitions andimplementation separate and allows you to easily swap models or change parts of the processing.
[Read More]
How Does Catboost Deal with Factors in loading?
What are you doing catboost?
Some people at curso-r, are working on an amazing extension of parsnip and allow you to use tidymodels packages like {parsnip} and {recipes} with the modern beasts of machine learning: lightgbm and catboost. the package is called treesnip and is still in development.
Both lightgbm and catboost can work with categorical features but how do you pass those to the machinery? Both lightgbm and catboost use special data structures. I was reading through the catboost documentation and it just wasn’t very clear to me.
[Read More]