- Go Machine Learning Projects
- Xuanyi Chew
- 219字
- 2021-06-10 18:46:34
Janitorial work
A large part of doing data science work is focused on cleanup. In productionized systems, this data would typically be fetched directly from the database, already relatively clean (high -quality production data science work requires a database of clean data). However, we're not in production mode yet. We're still in the model-building phase. It would be helpful to imagine writing a program solely for cleaning data.
Let's look at our requirements: starting with our data, each column is a variable—most of them are independent variables, except for the last column, which is the dependent variable. Some variables are categorical, and some are continuous. Our task is to write a function that will convert the data, currently [][]string to [][]float64.
To do that, we would require all the data to be converted into float64. For the continuous variables, it's an easy task: simply parse the string into a float. There are oddities that need to be handled, which I hope you had spotted by the time you opened the file in a spreadsheet. But the main pain is in converting categorical data to float64.
Fortunately for us, people much smarter than have figured this out decades ago. There exists an encoding scheme that allows categorical data to play nicely with linear regression algorithms.
- Big Data Analytics with Hadoop 3
- 現代測控電子技術
- OpenStack for Architects
- WOW!Illustrator CS6完全自學寶典
- 圖形圖像處理(Photoshop)
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- Visual C# 2008開發技術實例詳解
- 機器人創新實訓教程
- SharePoint 2010開發最佳實踐
- 網絡安全管理實踐
- 中國戰略性新興產業研究與發展·增材制造
- Introduction to R for Business Intelligence
- Java組件設計
- 計算智能算法及其生產調度應用
- Machine Learning with Spark(Second Edition)