- Go Machine Learning Projects
- Xuanyi Chew
- 339字
- 2021-06-10 18:46:39
Ingesting the data
Now, without much further ado, let's write some code to ingest the data. First, we need a data structure of a training example:
// Example is a tuple representing a classification example
type Example struct {
Document []string
Class
}
The reason for this is so that we can parse our files into a list of Example. The function is shown here:
func ingest(typ string) (examples []Example, err error) {
switch typ {
case "bare", "lemm", "lemm_stop", "stop":
default:
return nil, errors.Errorf("Expected only \"bare\", \"lemm\", \"lemm_stop\" or \"stop\"")
}
var errs errList
start, end := 0, 11
for i := start; i < end; i++ { // hold 30% for crossval
matches, err := filepath.Glob(fmt.Sprintf("data/lingspam_public/%s/part%d/*.txt", typ, i))
if err != nil {
errs = append(errs, err)
continue
}
for _, match := range matches {
str, err := ingestOneFile(match)
if err != nil {
errs = append(errs, errors.WithMessage(err, match))
continue
}
if strings.Contains(match, "spmsg") {
// is spam
examples = append(examples, Example{str, Spam})
} else {
// is ham
examples = append(examples, Example{str, Ham})
}
}
}
if errs != nil {
err = errs
}
return
}
Here, I used filepath.Glob to find a list of files that matches the pattern within the specific directory, which is hardcoded. It doesn't have to be hardcoded in your actual code, but hardcoding the path makes for simpler demo programs. For each of the matching filenames, we parse the file using the ingestOneFile function. Then we check whether the filename contains spmsg as a prefix. If it does, we create an Example that has Spam as its class. Otherwise, it will be marked as Ham. In the later sections of this chapter, I will walk through the Class type and the rationale for choosing it. For now, here's the ingestOneFile function. Take note of its simplicity:
func ingestOneFile(abspath string) ([]string, error) {
bs, err := ioutil.ReadFile(abspath)
if err != nil {
return nil, err
}
return strings.Split(string(bs), " "), nil
}
- Dreamweaver CS3+Flash CS3+Fireworks CS3創意網站構建實例詳解
- 大數據管理系統
- 電力自動化實用技術問答
- 影視后期制作(Avid Media Composer 5.0)
- Hands-On Neural Networks with Keras
- 自動檢測與傳感技術
- VMware Performance and Capacity Management(Second Edition)
- Multimedia Programming with Pure Data
- 永磁同步電動機變頻調速系統及其控制(第2版)
- 電氣控制與PLC技術應用
- 分析力!專業Excel的制作與分析實用法則
- Artificial Intelligence By Example
- AVR單片機工程師是怎樣煉成的
- Apache Spark Quick Start Guide
- 牛津通識讀本:大數據(中文版)