官术网_书友最值得收藏!

Munging and wrangling

The terms munging and wrangling are buzzwords or jargon meant to describe one's efforts to affect the format of data, recordset, or file in some way in an effort to prepare the data for continued or otherwise processing and/or evaluations.

With data development, you are most likely familiar with the idea of Extract, Transform, and Load (ETL). In somewhat the same way, a data developer may mung or wrangle data during the transformation steps within an ETL process.

Common munging and wrangling may include removing punctuation or HTML tags, data parsing, filtering, all sorts of transforming, mapping, and tying together systems and interfaces that were not specifically designed to interoperate. Munging can also describe the processing or filtering of raw data into another form, allowing for more convenient consumption of the data elsewhere.

Munging and wrangling might be performed multiple times within a data science process and/or at different steps in the evolving process. Sometimes, data scientists use munging to include various data visualization, data aggregation, training a statistical model, as well as much other potential work. To this point, munging and wrangling may follow a flow beginning with extracting the data in a raw form, performing the munging using various logic, and lastly, placing the resulting content into a structure for use.

Although there are many valid options for munging and wrangling data, preprocessing and manipulation, a tool that is popular with many data scientists today is a product named Trifecta, which claims that it is the number one (data) wrangling solution in many industries.

Trifecta can be downloaded for your personal evaluation from https://www.trifacta.com/. Check it out!
主站蜘蛛池模板: 隆回县| 苍梧县| 南宫市| 石河子市| 会昌县| 八宿县| 长治市| 永丰县| 通江县| 华阴市| 宝清县| 石狮市| 萍乡市| 邵阳市| 大冶市| 敖汉旗| 广平县| 区。| 澄城县| 安西县| 工布江达县| 嫩江县| 大安市| 庄河市| 奎屯市| 仪陇县| 砀山县| 哈巴河县| 临沂市| 临汾市| 常州市| 安国市| 广南县| 松阳县| 武清区| 上林县| 盐池县| 克山县| 罗江县| 沈阳市| 无棣县|