官术网_书友最值得收藏!

Reading delimited data and altered output format

Every day, we open many files in many different formats. However, when thinking about large amounts of data, it is always a good practice to use standard formats. One of these is called Comma Separated Values, or CSVs, and it uses a comma (,) to separate elements or delimit on each row. This is particularly useful when you have large amounts of data or records, and that data will be used in a scripted fashion. For example, in every school semester, Bob, the system administrator, needs to create a series of new users and set their information. Bob also gets a standardized CSV (like in the following snippet) from the people in charge of attendance:

Rbrash,Ron,Brash,01/31/88,+11234567890,rbrash@acme.com,FakePassword9000
...

If Bob the administrator wishes to only read this information into an array and create users, it is relatively trivial for him to parse a CSV and create each record in one single scripted action. This allows Bob to focus his time and effort on other important issues such as troubleshooting end-user WiFi issues.

While this is a trivial example, these same files may be in different forms with delimiters (the , or $ sign, for example), different data, and different structures. However, each file works on the premise that each line is a record that needs to be read into some structure (whatever it may be) in SQL, Bash arrays, and so on:

Line1Itself: Header (optional and might not be present)
Line2ItselfIsOneREc:RecordDataWithDelimiters:endline (windows \r\n, in Linux \n)
....

In the preceding example of a pseudo CSV, there is a header, which may be optional (not present), and then several lines (each being a record). Now, for Bob to parse the CSV, he has many ways to do this, but he may use specialized functions that apply a strategy such as:

$ Loop through each item until done
for each line in CSV:
# Do something with the data such as create a user
# Loop through Next item if it exists

To read in the data, Bob or yourself may resort to using:

  • For loops and arrays
  • A form of iterator
  • Manually walking through each line (not efficient)

Once any input data has been read in, the next step is to do something with the data itself. Is it to be transformed? Is it to be used immediately? Sanitized? Stored? Or converted to another format? Just like Bob, there are many things that can be performed using the data read in by the script.

In regards to outputting the data, we can also convert it to XML, JSON, or even insert it into a database as SQL. Unfortunately, this process requires being able to know at least two things: the format of the input data and the format of the output data.

Knowing common data formats and how they often have validation applied can be a great asset when building automated scripts and identifying any changes in the future. Enforcement of data validation also has several benefits and can help save the day when all of a sudden the script breaks without warning!

This recipe aims at walking you through reading a trivial CSV and outputting the data into some arbitrary formats.

主站蜘蛛池模板: 镇赉县| 忻州市| 离岛区| 福州市| 法库县| 共和县| 隆安县| 高邮市| 安西县| 安陆市| 林周县| 霍林郭勒市| 广南县| 自贡市| 峡江县| 宜君县| 岱山县| 洛隆县| 台湾省| 平泉县| 海原县| 萝北县| 潜山县| 宣城市| 慈溪市| 绥阳县| 荆州市| 漳州市| 前郭尔| 宣武区| 墨竹工卡县| 朝阳市| 明水县| 乌拉特后旗| 河北省| 普兰县| 鹿邑县| 青龙| 吉隆县| 五河县| 芦山县|