書名： Bash Cookbook
作者名： Ron Brash Ganesh Naik
本章字數： 544字
更新時間： 2021-07-23 19:17:40

Reading delimited data and altered output format

Every day, we open many files in many different formats. However, when thinking about large amounts of data, it is always a good practice to use standard formats. One of these is called Comma Separated Values, or CSVs, and it uses a comma (,) to separate elements or delimit on each row. This is particularly useful when you have large amounts of data or records, and that data will be used in a scripted fashion. For example, in every school semester, Bob, the system administrator, needs to create a series of new users and set their information. Bob also gets a standardized CSV (like in the following snippet) from the people in charge of attendance:

Rbrash,Ron,Brash,01/31/88,+11234567890,rbrash@acme.com,FakePassword9000
...

If Bob the administrator wishes to only read this information into an array and create users, it is relatively trivial for him to parse a CSV and create each record in one single scripted action. This allows Bob to focus his time and effort on other important issues such as troubleshooting end-user WiFi issues.

While this is a trivial example, these same files may be in different forms with delimiters (the , or $ sign, for example), different data, and different structures. However, each file works on the premise that each line is a record that needs to be read into some structure (whatever it may be) in SQL, Bash arrays, and so on:

Line1Itself: Header (optional and might not be present)
Line2ItselfIsOneREc:RecordDataWithDelimiters:endline (windows \r\n, in Linux \n)
....

In the preceding example of a pseudo CSV, there is a header, which may be optional (not present), and then several lines (each being a record). Now, for Bob to parse the CSV, he has many ways to do this, but he may use specialized functions that apply a strategy such as:

$ Loop through each item until done
for each line in CSV:
    # Do something with the data such as create a user
    # Loop through Next item if it exists

To read in the data, Bob or yourself may resort to using:

For loops and arrays
A form of iterator
Manually walking through each line (not efficient)

Once any input data has been read in, the next step is to do something with the data itself. Is it to be transformed? Is it to be used immediately? Sanitized? Stored? Or converted to another format? Just like Bob, there are many things that can be performed using the data read in by the script.

In regards to outputting the data, we can also convert it to XML, JSON, or even insert it into a database as SQL. Unfortunately, this process requires being able to know at least two things: the format of the input data and the format of the output data.

Knowing common data formats and how they often have validation applied can be a great asset when building automated scripts and identifying any changes in the future. Enforcement of data validation also has several benefits and can help save the day when all of a sudden the script breaks without warning!

This recipe aims at walking you through reading a trivial CSV and outputting the data into some arbitrary formats.

官术网_书友最值得收藏!

Bash Cookbook

Reading delimited data and altered output format