- Machine Learning With Go
- Daniel Whitenack
- 478字
- 2021-07-08 10:37:27
Manipulating CSV data with data frames
As you can see, manually parsing many different fields and performing row-by-row operations can be rather verbose and tedious. This is definitely not an excuse to increase complexity and import a bunch of non standard functionalities. You should still default to the use of encoding/csv in most cases.
However, manipulation of data frames has proven to be a successful and somewhat standardized way (in the data science community) of dealing with tabular data. Thus, in some cases, it is worth employing some third-party functionality to manipulate tabular data, such as CSV data. For example, data frames and the corresponding functionality can be very useful when you are trying to filter, subset, and select portions of tabular datasets. In this section, we will introduce github.com/kniren/gota/dataframe, a wonderful dataframe package for Go:
import "github.com/kniren/gota/dataframe"
To create a data frame from a CSV file, we open a file with os.Open() and then supply the returned pointer to the dataframe.ReadCSV() function:
// Open the CSV file.
irisFile, err := os.Open("iris.csv")
if err != nil {
log.Fatal(err)
}
defer irisFile.Close()
// Create a dataframe from the CSV file.
// The types of the columns will be inferred.
irisDF := dataframe.ReadCSV(irisFile)
// As a sanity check, display the records to stdout.
// Gota will format the dataframe for pretty printing.
fmt.Println(irisDF)
If we compile and run this Go program, we will see a nice, pretty-printed version of our data with the types that were inferred during parsing:
$ go build
$ ./myprogram
[150x5] DataFrame
sepal_length sepal_width petal_length petal_width species
0: 5.100000 3.500000 1.400000 0.200000 Iris-setosa
1: 4.900000 3.000000 1.400000 0.200000 Iris-setosa
2: 4.700000 3.200000 1.300000 0.200000 Iris-setosa
3: 4.600000 3.100000 1.500000 0.200000 Iris-setosa
4: 5.000000 3.600000 1.400000 0.200000 Iris-setosa
5: 5.400000 3.900000 1.700000 0.400000 Iris-setosa
6: 4.600000 3.400000 1.400000 0.300000 Iris-setosa
7: 5.000000 3.400000 1.500000 0.200000 Iris-setosa
8: 4.400000 2.900000 1.400000 0.200000 Iris-setosa
9: 4.900000 3.100000 1.500000 0.100000 Iris-setosa
... ... ... ... ...
<float> <float> <float> <float> <string>
Once we have the data parsed into a dataframe, we can filter, subset, and select our data easily:
// Create a filter for the dataframe.
filter := dataframe.F{
Colname: "species",
Comparator: "==",
Comparando: "Iris-versicolor",
}
// Filter the dataframe to see only the rows where
// the iris species is "Iris-versicolor".
versicolorDF := irisDF.Filter(filter)
if versicolorDF.Err != nil {
log.Fatal(versicolorDF.Err)
}
// Filter the dataframe again, but only select out the
// sepal_width and species columns.
versicolorDF = irisDF.Filter(filter).Select([]string{"sepal_width", "species"})
// Filter and select the dataframe again, but only display
// the first three results.
versicolorDF = irisDF.Filter(filter).Select([]string{"sepal_width", "species"}).Subset([]int{0, 1, 2})
This is really only scratching the surface of the github.com/kniren/gota/dataframe package. You can merge datasets, output to other formats, and even process JSON data. For more information about this package, you should visit the auto generated GoDocs at https://godoc.org/github.com/kniren/gota/dataframe, which is good practice, in general, for any packages we discuss in the book.
- C++程序設(shè)計(jì)教程
- Java逍遙游記
- Python自動(dòng)化運(yùn)維快速入門(mén)
- Java虛擬機(jī)字節(jié)碼:從入門(mén)到實(shí)戰(zhàn)
- 數(shù)據(jù)結(jié)構(gòu)(C語(yǔ)言)
- Nginx Essentials
- 自然語(yǔ)言處理Python進(jìn)階
- Getting Started with Laravel 4
- Python自然語(yǔ)言理解:自然語(yǔ)言理解系統(tǒng)開(kāi)發(fā)與應(yīng)用實(shí)戰(zhàn)
- Mastering Elixir
- Mastering Object:Oriented Python(Second Edition)
- Joomla!Search Engine Optimization
- C#程序開(kāi)發(fā)參考手冊(cè)
- Learning Gerrit Code Review
- 機(jī)器人ROS開(kāi)發(fā)實(shí)踐