Reading in your training data

Data Ingestion Patterns for ML

How do you get your training data into your model? Most tutorials and kaggle notebooks begin with reading of csv data, but in this post I hope I can convince you to do better. I think you should spend as much time as possible in the feature engineering and modelling part of your ML project, and as little time as possible on getting the data from somewhere to your machine. [Read More]

How Does Catboost Deal with Factors in loading?

What are you doing catboost?

Some people at curso-r, are working on an amazing extension of parsnip and allow you to use tidymodels packages like {parsnip} and {recipes} with the modern beasts of machine learning: lightgbm and catboost. the package is called treesnip and is still in development. Both lightgbm and catboost can work with categorical features but how do you pass those to the machinery? Both lightgbm and catboost use special data structures. I was reading through the catboost documentation and it just wasn’t very clear to me. [Read More]