Daniel Costea -How to fight data preparation debt with Jupyter Notebook & DataFrame using MS ML.NET – YouTube

“Until recently ML.NET did not have a good REPL (read–eval–print–loop) instrument to experience interactively with code and data, but now, .NET developers can run on-premise interactive machine learning scenarios with Jupyter Notebooks using C#, F#, or Powershell scripts in a web browser.

Similarly to technical debt, when it comes to data we can talk about data preparation debt. In order to reduce data preparation debt we need to find some answers:
– Is my data large enough?
– Is my data good enough?
– Is my data clean enough?
– Is my data biased?

ML.NET components like data loaders, estimators, trainers, transformers, or predictors are all working with DataView object. DataView is a lazy-loading object (like IEnumerable), therefore we have to call Fit method to bind it to data. In order to visualize and prepare the data, we need another object like DataFrame which is an in-memory collection of columns and rows, providing features similar to Pandas.

This talk will give you an overview of a variety of instruments like categorical and numerical distribution, box plot segmentation, correlation matrix, evaluation metrics, and confusion matrix, and more, which will help you to know if you have enough, good and unbiased data.”


Author: Scottish Summit

Share This Post On