官术网_书友最值得收藏!

Introduction

Data is everywhere, logging is cheap, and analysis is inevitable. One of the most fundamental concepts of this chapter is based on gathering useful data. After building a large collection of usable text, which we call the corpus, we must learn to represent this content in code. The primary focus will be first on obtaining data and later on enumerating ways of representing it.

Gathering data is arguably as important as analyzing it to extrapolate results and form valid generalizable claims. It is a scientific pursuit; therefore, great care must and will be taken to ensure unbiased and representative sampling. We recommend following along closely in this chapter because the remainder of the book depends on having a source of data to work with. Without data, there isn't much to analyze, so we should carefully observe the techniques laid out to build our own formidable corpus.

The first recipe enumerates various sources to start gathering data online. The next few recipes deal with using local data of different file formats. We then learn how to download data from the Internet using our Haskell code. Finally, we finish this chapter with a couple of recipes on using databases in Haskell.

主站蜘蛛池模板: 临西县| 宜兰县| 左贡县| 瑞昌市| 崇州市| 镇坪县| 清水河县| 芒康县| 灵寿县| 龙江县| 东乡县| 江陵县| 独山县| 大城县| 金昌市| 车致| 龙里县| 贡觉县| 宝丰县| 浦北县| 长丰县| 大名县| 米脂县| 瓮安县| 遂溪县| 全椒县| 奇台县| 合山市| 莒南县| 通辽市| 广汉市| 新巴尔虎左旗| 常山县| 会宁县| 渭南市| 京山县| 怀来县| 南雄市| 宕昌县| 岳西县| 贺州市|