官术网_书友最值得收藏!

Python for Data Wrangling

There is always a debate on whether to perform the wrangling process using an enterprise tool or by using a programming language and associated frameworks. There are many commercial, enterprise-level tools for data formatting and pre-processing that do not involve much coding on the part of the user. These examples include the following:

  • General purpose data analysis platforms such as Microsoft Excel (with add-ins)
  • Statistical discovery package such as JMP (from SAS)
  • Modeling platforms such as RapidMiner
  • Analytics platforms from niche players focusing on data wrangling, such as Trifacta, Paxata, and Alteryx

However, programming languages such as Python provide more flexibility, control, and power compared to these off-the-shelf tools.

As the volume, velocity, and variety (the three Vs of big data) of data undergo rapid changes, it is always a good idea to develop and nurture a significant amount of in-house expertise in data wrangling using fundamental programming frameworks so that an organization is not beholden to the whims and fancies of any enterprise platform for as basic a task as data wrangling:

Figure 1.2: Google trend worldwide over the last Five years

A few of the obvious advantages of using an open source, free programming paradigm such as Python for data wrangling are the following:

  • General purpose open source paradigm putting no restriction on any of the methods you can develop for the specific problem at hand
  • Great ecosystem of fast, optimized, open source libraries, focused on data analytics
  • Growing support to connect Python to every conceivable data source type
  • Easy interface to basic statistical testing and quick visualization libraries to check data quality
  • Seamless interface of the data wrangling output with advanced machine learning models

Python is the most popular language of choice of machine learning and artificial intelligence these days.

主站蜘蛛池模板: 方山县| 无棣县| 石城县| 呼伦贝尔市| 铁力市| 唐山市| 九龙县| 嘉定区| 大兴区| 施秉县| 磐安县| 马龙县| 调兵山市| 大新县| 枣强县| 乐平市| 额敏县| 长子县| 肥东县| 孝感市| 宜兰县| 涡阳县| 河西区| 新平| 肇源县| 宜黄县| 全椒县| 株洲县| 黑水县| 沅江市| 南部县| 红安县| 莲花县| 庄河市| 四子王旗| 扶余县| 墨竹工卡县| 隆回县| 英超| 澜沧| 濮阳市|