We have already discussed several important aspects such as the power of the SED and AWK commands, and even CSVs, but we have not discussed the importance of being able to transform the format and structure of data. CSVs are a fundamental and very common format of data, but unfortunately, it isn't the best choice for some applications, so we may use XML or JSON. Here are two scripts (or rather one script and one tool) that can convert our original data into various formats:
When executing data-csv-to-xml.sh, we notice several things: we utilize two source template files, which can be altered for flexibility, and then a large piped command that leverages sed and AWK. On input, we take each of the CSV values and build a <word lang="x">Y</word> XML element using the format template inside of word.tpl, where $0 is field one and $1 is field two. The script will produce a words.csv and output the following:
The differences and reasons between all three formats of data (CSV, XML, and JSON) is left as an exercise for the reader to discover. Another exercise to explore is performing data validation to ensure integrity and constraints on data. For example, XML can use XSD schemas to enforce data limits.