- Go Machine Learning Projects
- Xuanyi Chew
- 339字
- 2021-06-10 18:46:39
Ingesting the data
Now, without much further ado, let's write some code to ingest the data. First, we need a data structure of a training example:
// Example is a tuple representing a classification example
type Example struct {
Document []string
Class
}
The reason for this is so that we can parse our files into a list of Example. The function is shown here:
func ingest(typ string) (examples []Example, err error) {
switch typ {
case "bare", "lemm", "lemm_stop", "stop":
default:
return nil, errors.Errorf("Expected only \"bare\", \"lemm\", \"lemm_stop\" or \"stop\"")
}
var errs errList
start, end := 0, 11
for i := start; i < end; i++ { // hold 30% for crossval
matches, err := filepath.Glob(fmt.Sprintf("data/lingspam_public/%s/part%d/*.txt", typ, i))
if err != nil {
errs = append(errs, err)
continue
}
for _, match := range matches {
str, err := ingestOneFile(match)
if err != nil {
errs = append(errs, errors.WithMessage(err, match))
continue
}
if strings.Contains(match, "spmsg") {
// is spam
examples = append(examples, Example{str, Spam})
} else {
// is ham
examples = append(examples, Example{str, Ham})
}
}
}
if errs != nil {
err = errs
}
return
}
Here, I used filepath.Glob to find a list of files that matches the pattern within the specific directory, which is hardcoded. It doesn't have to be hardcoded in your actual code, but hardcoding the path makes for simpler demo programs. For each of the matching filenames, we parse the file using the ingestOneFile function. Then we check whether the filename contains spmsg as a prefix. If it does, we create an Example that has Spam as its class. Otherwise, it will be marked as Ham. In the later sections of this chapter, I will walk through the Class type and the rationale for choosing it. For now, here's the ingestOneFile function. Take note of its simplicity:
func ingestOneFile(abspath string) ([]string, error) {
bs, err := ioutil.ReadFile(abspath)
if err != nil {
return nil, err
}
return strings.Split(string(bs), " "), nil
}
- Cloud Analytics with Microsoft Azure
- Julia 1.0 Programming
- RPA(機器人流程自動化)快速入門:基于Blue Prism
- 基于ARM 32位高速嵌入式微控制器
- 3D Printing for Architects with MakerBot
- Windows 7寶典
- 單片機C語言應用100例
- 智能鼠原理與制作(進階篇)
- 傳感器與自動檢測
- 智慧未來
- 歐姆龍PLC應用系統設計實例精解
- Windows Server 2012 Automation with PowerShell Cookbook
- Apache Hadoop 3 Quick Start Guide
- Adobe Edge Quickstart Guide
- 計算機仿真技術