Beginner level posts

A Model not in Production is a Waste of Money and Time

Posted on November 2, 2024 (Last modified on October 31, 2024) | 2 minutes | 401 words | Roel M. Hogervorst

I always push on people to make their ML project reach production. Even if it is not that good yet and even if you could eke out a bit more performance. I’ve been inspired by the dev-ops and lean movements and I hope you will be too. ML products have many ways to improve, you can always tweak more. But ML is high risk, with a possible high reward and relatively expensive compared to ‘normal code’. [Read More]

MLOps production

The Disney+ App Really Sucks

Posted on October 28, 2024 (Last modified on October 29, 2024) | 3 minutes | 429 words | Roel M. Hogervorst

The disney+ app, really sucks. I have an Android device that hosts the app and I only every play on the chromecast. To be clear, I use a chromecast on a TV with CEC enabled. That is, you can send commands from your remote to connected devices. This is really nice, you can pauze, play, stop, rewind, toggle subtitles. And you can skip ahead, and back. There is even a button to accept things. [Read More]

webtechnologies chromecast

So you've just lost a million dollars in the genAI hype

what lessons can you learn?

Posted on August 1, 2024 | 4 minutes | 790 words | Roel M. Hogervorst

Hi C-level person! Are you feeling down because AI is not working for you? Let me know if this is you: A smug consultant sold you a genAI solution. By now you’ve realised that it doesn’t work, it can not work in theory and now it also doesn’t work in practice. You still have data quality issues, and your promised profits are non-existing. Are there any lessons you can learn from this fiasco? [Read More]

LLMs genAI pseudoprofoundBS

A rant about tp-link wifi boxes

No internet? no wifi for you!

Posted on July 17, 2024 (Last modified on July 16, 2024) | 2 minutes | 412 words | Roel M. Hogervorst

My internet was down for several days, (see previous post) and the only thing that really broke, except for obviously internet connected services on home-assistant, was the wifi. I have tp-link deco boxes and they work really okay for most of the time. They form a mesh and connect with whatever connection is best (through electricity, point to point wifi, or a network cable). In general, they just work. Until your internet connection is down. [Read More]

DNS pi-hole lan

An offline first smart home is really nice

Local first smart home was a great decision

Posted on July 16, 2024 | 1 minutes | 206 words | Roel M. Hogervorst

Recently my internet connection was down for several days, where I live in Europe that almost never happens. I am so used to having an internet connection that I really had to adjust to this. Of course having mobile phones with data connections means my online addiction was regularly fed, but I can’t roam my home lan over my mobile connection (yet?). Home assistant just kept going and that is awesome! [Read More]

home-assistant webtechnologies smart-home

ChatGPT in (the Core of) your Product is a Bad Idea

Foundational models are inherently risky.

Posted on March 19, 2024 (Last modified on August 1, 2024) | 4 minutes | 826 words | Roel M. Hogervorst

Google ~bard~1 gemini, Claude, or chatGPT seem to be able to do many things. They have easy APIs and many plugins. The price is lower than seems possible. And yet, integrating these things into your product is really risky. Here is why: Problems with foundational models These “AI’s”2 are build on foundational models. They are trained on massive amounts of text data, and finally finetuned for specific tasks. We don’t know what data was used for training. [Read More]

ai-risks antipattern pseudoprofoundBS genAI

The art (and science) of feature engineering

combining best practices from science, and engineering

Posted on March 1, 2024 | 5 minutes | 1010 words | Roel M. Hogervorst

Data scientists, in general, do not just throw data into a model. They use feature engineering; transforming input data to make it easy for the chosen machine learning algorithm to pick up the subtleties in the data. Data scientists do this so the model can predict outcomes better. In the image below you see a transformation of data into numeric values with meaning. In this article I’ll discuss why we still need feature engineering (FE) in the age of Large language models, and what some best practices are. [Read More]

data_science programming reproducability

Creating One Unified Calendar of all Data Science Events in the Netherlands

Over engineering with renv and github actions

Posted on December 2, 2022 | 3 minutes | 439 words | Roel M. Hogervorst

I enjoy learning new things about machine learning, and I enjoy meeting like minded people too. That is why I go to meetups and conferences. But not everyone I meet becomes a member of every group. So I keep sending my coworkers new events that I hear about here in the Netherlands. And it is easy to overlook a new event that comes in over email. Me individually cannot scale. So in this post I will walk you through an over engineered solution to make myself unnecessary. [Read More]

calendar devops CI/CD git renv scheduling ghactions

Gosset part 2: small sample statistics

Scientific brewing at scale

Posted on October 11, 2019 (Last modified on November 9, 2022) | 14 minutes | 2804 words | Roel M. Hogervorst

Simulation was the key to to achieve world beer dominance. ‘Scientific’ Brewing at scale in the early 1900s Beer bottles cheers This post is an explainer about the small sample experiments performed by William S. Gosset. This post contains some R code that simulates his simulations1 and the resulting determination of the ideal sample size for inference. If you brew your own beer, or if you want to know how many samples you need to say something useful about your data, this post is for you. [Read More]

gosset t-distribution simulation tidyverse tibble dplyr

William Sealy Gosset one of the first data scientists

The father of the t-distribution

Posted on August 17, 2019 (Last modified on November 9, 2022) | 3 minutes | 600 words | Roel M. Hogervorst

I think William Sealy Gosset, better known as ‘Student’ is the first data scientist. He used math to solve real world business problems, he worked on experimental design, small sample statistics, quality control, and beer. In fact, I think we should start a fanclub! And as the first member of that fanclub, I have been to the Guinness brewery to take a picture of Gosset’s only visible legacy there. W. S. [Read More]

gosset data_science