zettlr to hugo

I already know hugo!

zettlr to hugo
This is a technical walkthrough of how I turn zettlr markdown files into a hugo website. This is an experiment and not yet finished, but I want to write down my thoughtprocess in the hope that it works for you too. I was looking for a way to publish my zettelkasten to a website that only I can see. my solution is: any self published, local website will do, as long as I put it behind a tailscale network, then I can view the website with my devices everywhere (through tailscale). [Read More]

Zettlr to mkdocs

Let's use python, I already know that

Zettlr to mkdocs
This is a technical walkthrough of how I turn zettlr markdown files into a mkdocs website. This is an experiment and not yet finished, but I want to write down my thoughtprocess in the hope that it works for you too. I was looking for a way to publish my zettelkasten to a website that only I can see. my solution is: any self published, local website will do, as long as I put it behind a tailscale network, then I can view the website with my devices everywhere (through tailscale). [Read More]

Private Personal Knowledge Management

But globally accessable to me alone

Private Personal Knowledge Management
Atomic ideas, connected I write many blogposts on the basis of my notes in my personal knowledge system. I’m using a digital zettelkasten method (you pull apart knowledge into atomic components (zettels) and write those components down, connecting them in ways that make sense for you). This allows me to make creative connections between ideas and concepts. living apart, together Sometimes at work I want to look up something that I know is in my zettelkasten. [Read More]

Entity resolution for data scientists

or data matching, or data deduplication or record linkage

Entity resolution for data scientists
I have a problem. Others have it too, it is a problem of duplication. I’m trying to track the books I read in Bookwyrm so I can talk about it online. But there are so many duplicates! How do we know if Soren Kierkegaard,Søren Kierkegaard, and Sören Kierkegaard are the same person? This is an example of entity resolution1. It is also called deduplication, record linkage and data matching 2. We want to compare entities from different datasets and make a confident claim that they match or not. [Read More]

ChatGPT in (the Core of) your Product is a Bad Idea

Foundational models are inherently risky.

ChatGPT in (the Core of) your Product is a Bad Idea
Google ~bard~1 gemini, Claude, or chatGPT seem to be able to do many things. They have easy APIs and many plugins. The price is lower than seems possible. And yet, integrating these things into your product is really risky. Here is why: Problems with foundational models These “AI’s”2 are build on foundational models. They are trained on massive amounts of text data, and finally finetuned for specific tasks. We don’t know what data was used for training. [Read More]

The art (and science) of feature engineering

combining best practices from science, and engineering

The art (and science) of feature engineering
Data scientists, in general, do not just throw data into a model. They use feature engineering; transforming input data to make it easy for the chosen machine learning algorithm to pick up the subtleties in the data. Data scientists do this so the model can predict outcomes better. In the image below you see a transformation of data into numeric values with meaning. In this article I’ll discuss why we still need feature engineering (FE) in the age of Large language models, and what some best practices are. [Read More]

The city, neighborhoods and streets: Organizing your MLproject

reduce your mental load by using conventions

Have you received a project that someone else created and did it make you go 🤯? (Was that someone else: you from a few months back?1 ) Sometimes a project organically grows into a mess of scripts and you don’t know how to make it better. The main problem is often the project organization. I want you to think about, and organize, your project in three levels that I call city-level, neighborhood-level and street-level. [Read More]

Should I Move to a Database?

Long ago at a real-life meetup (remember those?), I received a t-shirt which said: “biggeR than R”. I think it was by microsoft, who develop a special version of R with automatic parallel work. Anyways, I was thinking about bigness (is that a word? it is now!) of your data. Is your data becoming to big? big data stupid gif Your dataset becomes so big and unwieldy that operations take a long time. [Read More]

Quick post - detect and fix this ggplot2 antipattern

Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR : Whenever you find yourself adding multiple geom_* to show different groups, reshape your data In software engineering there are things called antipatterns, ways of programming that lead you into potential trouble. This is one of them. I’m not saying it is incorrect, but it might lead you into trouble. [Read More]

Graphing My Daily Phone Use

How many times do I look at my phone? I set up a small program on my phone to count the screen activations and logged to a file. In this post I show what went wrong and how to plot the results. The data I set up a small program on my phone that counts every day how many times I use my phone (to be specific, it counts the times the screen has been activated). [Read More]