官术网_书友最值得收藏!

Munging and wrangling

The terms munging and wrangling are buzzwords or jargon meant to describe one's efforts to affect the format of data, recordset, or file in some way in an effort to prepare the data for continued or otherwise processing and/or evaluations.

With data development, you are most likely familiar with the idea of Extract, Transform, and Load (ETL). In somewhat the same way, a data developer may mung or wrangle data during the transformation steps within an ETL process.

Common munging and wrangling may include removing punctuation or HTML tags, data parsing, filtering, all sorts of transforming, mapping, and tying together systems and interfaces that were not specifically designed to interoperate. Munging can also describe the processing or filtering of raw data into another form, allowing for more convenient consumption of the data elsewhere.

Munging and wrangling might be performed multiple times within a data science process and/or at different steps in the evolving process. Sometimes, data scientists use munging to include various data visualization, data aggregation, training a statistical model, as well as much other potential work. To this point, munging and wrangling may follow a flow beginning with extracting the data in a raw form, performing the munging using various logic, and lastly, placing the resulting content into a structure for use.

Although there are many valid options for munging and wrangling data, preprocessing and manipulation, a tool that is popular with many data scientists today is a product named Trifecta, which claims that it is the number one (data) wrangling solution in many industries.

Trifecta can be downloaded for your personal evaluation from https://www.trifacta.com/. Check it out!
主站蜘蛛池模板: 玛纳斯县| 普兰县| 天等县| 长沙县| 高尔夫| 密云县| 高陵县| 永仁县| 全州县| 明溪县| 崇信县| 黄大仙区| 锡林郭勒盟| 新建县| 阳原县| 西充县| 抚顺县| 石景山区| 临湘市| 十堰市| 西城区| 龙井市| 荔浦县| 澄城县| 沧州市| 砀山县| 文山县| 遵化市| 邹城市| 营山县| 峨眉山市| 嘉峪关市| 高碑店市| 江源县| 白河县| 青浦区| 固镇县| 太湖县| 南汇区| 荃湾区| 佛学|