I think a lot about moving single R scripts from someone’s computer to the cloud (another computer). One of the major questions you need to answer is:
Can I give my solution to someone else in a way that it ‘just’ works?
R is an high level language. This allows you to write out the steps you want to take and that the actual implementation is hidden (can you imagine writing all the steps your computer needs to take?
[Read More]
Rectangling (Social) Network Data, Advanced Options
Link features, for link prediction
This walkthrough is a follow up on my previous post about rectangling network data As a recap: we want to predict links between nodes in a graph by using features of the vertices. In the previous post I showed how to load flat files into a graph structure with {tidygraph}, how to select positive and negative examples, and I extracted some node features.
Because we want to predict if a link between two nodes is probable, we can use the node features, but there is also some other information about the edges in the graph that we cannot get out with node features only procedure.
[Read More]
Predicting links for network data
NETWORKS, PREDICT EDGES
Can we predict if two nodes in the graph are connected or not?
But let’s make it very practical:
Let’s say you work in a social media company and your boss asks you to create a model to predict who will be friends, so you can feed those recommendations back to the website and serve those to users.
You are tasked to create a model that predicts, once a day for all users, who is likely to connect to whom.
[Read More]
Running an R Script on a Schedule: Overview
There are lots of rstats tutorials about creating beautiful plots, setting up shiny applications and even a few on setting up plumber APIs (but we could use more). However a lot of work consists of running a script without any interaction.
This is an overview page for the tutorials I’ve created so far. This overview is for you if you want to know how to run your batch script (do one thing without supervision) automatically.
[Read More]
Running an R Script on a Schedule: Gh-Actions
Tweeting from github actions
In this tutorial I have an R script that runs every day on github actions. It creates a curve in ggplot2 and posts that picture to twitter.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule.
[Read More]
Running an R Script on a Schedule: Gitlab
Tweeting from gitlab actions
In this tutorial I have an R script that creates a plot and tweets it, it runs every day on gitlab runners.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
Other ways to schedule a script I will create a new post for many of the other ways on which you can run an R script on schedule.
[Read More]
Running an R Script on a Schedule: Heroku
Tweeting from heroku
In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter.
The use case is this: You have a script and it needs to run on a schedule (for instance every day).
In 2018 I wrote a small post how to run an R script on heroku. The amazing thing is that the bot I created back then is still running!
[Read More]
How Does Catboost Deal with Factors in loading?
What are you doing catboost?
Some people at curso-r, are working on an amazing extension of parsnip and allow you to use tidymodels packages like {parsnip} and {recipes} with the modern beasts of machine learning: lightgbm and catboost. the package is called treesnip and is still in development.
Both lightgbm and catboost can work with categorical features but how do you pass those to the machinery? Both lightgbm and catboost use special data structures. I was reading through the catboost documentation and it just wasn’t very clear to me.
[Read More]
Expressing size in bananas a dive into {vctrs}
Yes I made a stupid package to express lengths in bananas
Recently I’ve become interested in relative sizes of things. Maybe I’m paying more attention to my surroundings since I’m locked at home for so long. Maybe my inner child is finally breaking free. Whatever the reason, I channeled all of that into two packages:
everydaysizes A rather unfinished collection of dimensions of everyday objects. banana A package that displays dimensions as … bananas. I’ve collected a bunch of sizes and turned them into ‘units’.
[Read More]