官术网_书友最值得收藏!

Data Processing Toolbox

In the previous chapter, we discussed the best practices for approaching data science problems. We looked at CRISP-DM, which is the methodology for dealing with data mining projects, and one of the first steps there is data preprocessing. In this chapter, we will take a closer look at how to do this in Java.

Specifically, we will cover the following topics:

  • Standard Java library
  • Extensions to the standard library
  • Reading data from different sources such as text, HTML, JSON, and databases
  • DataFrames for manipulating tabular data

In the end, we will put everything together to prepare the data for the search engine.

By the end of this chapter, you will be able to process data such that it can be used for machine learning and further analysis.

主站蜘蛛池模板: 黔西县| 宁化县| 庆元县| 杭州市| 兴山县| 原平市| 银川市| 甘孜县| 莒南县| 吉林省| 安阳县| 邹城市| 馆陶县| 文化| 贵德县| 湘潭县| 阿拉善左旗| 松滋市| 图们市| 宁武县| 东兴市| 锡林郭勒盟| 宝鸡市| 嫩江县| 马山县| 承德县| 曲靖市| 大石桥市| 茌平县| 恩平市| 三门峡市| 曲阳县| 四平市| 永定县| 加查县| 屯留县| 龙泉市| 彭山县| 南投市| 洮南市| 同江市|