官术网_书友最值得收藏!

Introduction

Every data scientist needs to deal with data that is stored on disks in several formats, such as ASCII text, PDF, XML, JSON, and so on. Also, data can be stored in database tables. The first and foremost task for a data scientist before doing any analysis is to obtain data from these data sources and of these formats, and apply data-cleaning techniques to get rid of noises present in them. In this chapter, we will see recipes to accomplish this important task.

We will be using external Java libraries (Java archive files or simply JAR files) not only for this chapter but throughout the book. These libraries are created by developers or organizations to make everybody's life easier. We will be using Eclipse IDE for code development, preferably on the Windows platform, and execution throughout the book. Here is how you can include any external JAR file, and in many recipes, where I instruct you to include external JAR files into your project, this is what you need to do.

You can add a JAR file in a project in Eclipse by right-clicking on the Project | Build Path | Configure Build Path. Under the Libraries tab, click on Add External JARs..., and select the external JAR file(s) that you are going to use for a particular project:

Introduction

主站蜘蛛池模板: 虎林市| 沧源| 高青县| 雅安市| 微山县| 兴和县| 石嘴山市| 应城市| 宁强县| 惠来县| 临汾市| 遂宁市| 穆棱市| 垫江县| 左云县| 许昌市| 杭州市| 曲周县| 武安市| 太原市| 邵武市| 武胜县| 留坝县| 竹北市| 杭锦旗| 忻州市| 南澳县| 巴青县| 岫岩| 汉中市| 易门县| 六枝特区| 鄂温| 哈尔滨市| 绥棱县| 徐水县| 泾川县| 临沧市| 垫江县| 民权县| 正定县|