官术网_书友最值得收藏!

Best practices for gathering and organizing data with Go

As you can see in the preceding section, Go itself provides us with an opportunity to maintain high levels of integrity in our data gathering, parsing, and organization. We want to ensure that we leverage Go's unique properties whenever we are preparing our data for machine learning workflows.

Generally, Go data scientists/analysts should follow the following best practices when gathering and organizing data. These best practices are meant to help you maintain integrity in your applications, and been able you to reproduce any analysis:

  1. Check for and enforce expected types: This might seem obvious, but it is too often overlooked when using dynamically typed languages. Although it is slightly verbose, explicitly parsing data into expected types and handling related errors can save you big headaches down the road.
  2. Standardize and simplify your data ingress/egress: There are many third-party packages for handling certain types of data or interactions with certain sources of data (some of which we will cover in this book). However, if you standardize the ways you are interacting with data sources, particularly centered around the use of stdlib, you can develop predictable patterns and maintain consistency within your team. A good example of this is a choice to utilize database/sql for database interactions rather than using various third-party APIs and DSLs.
  3. Version your data: Machine learning models produce extremely different results depending on the training data you use, your choice of parameters, and input data. Thus, it is impossible to reproduce results without versioning both your code and data. We will discuss the appropriate techniques for data versioning later in this chapter.
If you start to stray from these general principles, you should stop immediately. You are likely to sacrifice integrity for the sake of convenience, which is a dangerous road. We will let these principles guide us through the book and as we consider various data formats/sources in the following section.
主站蜘蛛池模板: 桃源县| 天台县| 温泉县| 祁连县| 常州市| 平舆县| 巢湖市| 龙岩市| 山西省| 彭州市| 郯城县| 璧山县| 林州市| 塔城市| 华安县| 交城县| 华坪县| 莱阳市| 伽师县| 商城县| 罗甸县| 虎林市| 察雅县| 太仓市| 中西区| 天长市| 永川市| 门头沟区| 工布江达县| 灯塔市| 绥阳县| 甘孜县| 抚宁县| 光泽县| 肇州县| 乌拉特中旗| 红安县| 烟台市| 鹿邑县| 吉木萨尔县| 凉山|