“Until recently ML.NET did not have a good REPL (read–eval–print–loop) instrument to experience interactively with code and data, but now, .NET developers can run on-premise interactive machine learning scenarios with Jupyter Notebooks using C#, F#, or Powershell scripts in a web browser.
Similarly to technical debt, when it comes to data we can talk about data preparation debt. In order to reduce data preparation debt we need to find some answers:
– Is my data large enough?
– Is my data good enough?
– Is my data clean enough?
– Is my data biased?
ML.NET components like data loaders, estimators, trainers, transformers, or predictors are all working with DataView object. DataView is a lazy-loading object (like IEnumerable), therefore we have to call Fit method to bind it to data. In order to visualize and prepare the data, we need another object like DataFrame which is an in-memory collection of columns and rows, providing features similar to Pandas.
This talk will give you an overview of a variety of instruments like categorical and numerical distribution, box plot segmentation, correlation matrix, evaluation metrics, and confusion matrix, and more, which will help you to know if you have enough, good and unbiased data.”
More About This Author
- YouTube2021.03.06Julien Miquel – Le centre d’excellence : Pour une gouvernance maîtrisée dans la Power Platform – YouTube
- YouTube2021.03.06Veronique Lengelle – Gérer SharePoint Online avec PowerShell PnP – YouTube
- YouTube2021.03.06Florent Appointaire – AKS et App Gateway: le combo idéal – YouTube
- YouTube2021.03.06David Rivard – Les API Personnalisés (Custom Api), une nouvelle perspective! – YouTube