- Go Machine Learning Projects
- Xuanyi Chew
- 359字
- 2021-06-10 18:46:36
Standardization
As a last bit of transformation, we would need to standardize our input data. This allow us to compare models to see if one model is better than another. To do so, I wrote two different scaling algorithms:
func scale(a [][]float64, j int) {
l, m, h := iqr(a, 0.25, 0.75, j)
s := h - l
if s == 0 {
s = 1
}
for _, row := range a {
row[j] = (row[j] - m) / s
}
}
func scaleStd(a [][]float64, j int) {
var mean, variance, n float64
for _, row := range a {
mean += row[j]
n++
}
mean /= n
for _, row := range a {
variance += (row[j] - mean) * (row[j] - mean)
}
variance /= (n-1)
for _, row := range a {
row[j] = (row[j] - mean) / variance
}
}
If you come from the Python world of data science, the first scale function is essentially what scikits-learn's RobustScaler does. The second function is essentially StdScaler, but with the variance adapted to work for sample data.
This function takes the values in a given column (j) and scales them in such a way that all the values are constrained to within a certain value. Also, note that the input to both scaling functions is [][]float64. This is where the benefits of the tensor package comes in handy. A *tensor.Dense can be converted to [][]float64 without any extra allocations. An additional beneficial side effect is that you can mutate a and the tensor values will change as well. Essentially, [][]float64 will act as an iterator to the underlying tensor data.
Our transform function now looks like this:
func transform(it [][]float64, hdr []string, hints []bool) []int {
var transformed []int
for i, isCat := range hints {
if isCat {
continue
}
skewness := skew(it, i)
if skewness > 0.75 {
transformed = append(transformed, i)
log1pCol(it, i)
}
}
for i, h := range hints {
if !h {
scale(it, i)
}
}
return transformed
}
Note that we only want to scale the numerical variables. The categorical variables can be scaled, but there isn't really much difference.
- Word 2003、Excel 2003、PowerPoint 2003上機(jī)指導(dǎo)與練習(xí)
- GNU-Linux Rapid Embedded Programming
- Oracle SOA Governance 11g Implementation
- Getting Started with MariaDB
- Learning Social Media Analytics with R
- 圖解PLC控制系統(tǒng)梯形圖和語句表
- 快學(xué)Flash動(dòng)畫百例
- 智能工業(yè)報(bào)警系統(tǒng)
- 最簡(jiǎn)數(shù)據(jù)挖掘
- 80x86/Pentium微型計(jì)算機(jī)原理及應(yīng)用
- 愛犯錯(cuò)的智能體
- 單片機(jī)技術(shù)一學(xué)就會(huì)
- 網(wǎng)絡(luò)存儲(chǔ)·數(shù)據(jù)備份與還原
- 經(jīng)典Java EE企業(yè)應(yīng)用實(shí)戰(zhàn)
- Mastering Exploratory Analysis with pandas