官术网_书友最值得收藏!

  • Bash Cookbook
  • Ron Brash Ganesh Naik
  • 958字
  • 2021-07-23 19:17:37

How it works...

After running the two scripts in this recipe, we can see a few items emerge (especially if we compare the built-in Bash functionality for searching, replacing, and substrings).

  1. After executing some-strs.sh, we can see the following output in the console:
$ bash ./some-strs.sh 
First character 1
First three characters 123
Third character onwards 4567890asdfghjkl
Forth to sixth character 456
Last character by itself l
Remove last character only 1234567890asdfghjk

At this point, we have seen the echo command used several times, but the -n flag means that we should not automatically create a new line (or carriage return). The <<< re-direct for inputting values as a string has also been previously used, so this should not be new information. Given that, in the first instance, we are using sed like this: sed 's/.//2g' <<< $STR . This script uses sed in very simple ways compared to the plethora of ways you can combine pure sed with regexs. First, you have the command (sed), then the parameters ('s/.//2g'), and then the input (<<< $STR). You can also combine parameters like this: 's/.//2g;s/','/'.'/g' . To get the first character, we use sed in substitute mode (s/) and we retrieve two characters using (/2g), where g stands for global pattern.

The reason it is 2g and not 1g is that a null byte is returned automatically and therefore, if you desire n characters then you must specify n+1 characters. To return the first three characters, we merely change the sed parameters to include 4g instead of 2g.

In the next block of the script, we use sed as follows: sed -r 's/.{3}//' and sed -r '$s/.{3}//;s/.//4g' . You can see that in the first execution of sed, -r is used to specify a regex and so we use the regex to return the string at position 4 (again, those pesky arrays and strings) and everything beyond. In the second instance, we combine starting at the third character but limit the output to only 3 characters.

In the third block of script, we want the final character of the string using sed 's/.*\(.$\)/\1/' and then get the entire string except the last character using sed 's/.$//'. In the first instance, we use grouping and wildcards to create the regular expression to return only one character (the last character in the string), and in the second instance, we use the .$ pattern to create an expression that returns everything minus the last character.

It is important to note that search and replace can also be used for deletion operations by specifying an empty value to replace. You can also use the -i flag for in-place edits and to also perform deletion using other flags/parameters.
  1. Onto the next script, and after execution, the console should look similar to the following:
$ bash more-strsng.sh 
We have 2 rows in testdata/garbage.csv
Bob,Jane,Naz,Sue,Max,Tom$
Zero,Alpha,Beta,Gama,Delta,Foxtrot#
BOB,JANE,NAZ,SUE,MAX,TOM
ZERO,ALPHA,BETA,GAMA,DELTA,FOXTROT
#1000,Robert,Green,Dec,1,1967
#2000,Ron,Brash,Jan,20,1987
#3000,James,Fairview,Jul,15,1992
#1000,Robert,Green,Dec,1,1967
#2000,Ron,Brash,Jan,20,1987
#3000,James,Fairview,Jul,15,1992
#1000,Robert,Green,Dec,1967
#2000,Ron,Brash,Jan,1987
#3000,James,Fairview,Jul,1992

Again, in the first block of code, we read in the CSV into an array and for each element, we perform a substitution to remove the spaces: sed 's/ //g'

In the second block, again, we iterate through the array, but we remove the last character, sed  's/.$//', and then pipe the output to convert everything to uppercase using sed -e 's/.*/\U&/'. In the first part of the pipe, we search for the last character using .$ and remove it (the //). Then, we use an expression to select everything and convert it to upper case using \U& (notice that it is a special case allowed by GNU sed). Lowercase can be achieved using \L& instead.

In the third block, again, we used a for each loop and a subshell, but we didn't echo the input into sed. Sed also takes input like this using the <<< input direction. Using sed -e 's/^/#/', we start at the beginning of the string (specified by the ^) and append a #.

Next, for the last three examples, we perform work on the actual files themselves and not the arrays loaded into memory by using sed with the -i flag. This is an important distinction as it will have direct consequences on the files used as input; this is probably what you desire in your scripts anyway! To replace Bob with Robert, it is the same as removing spaces except we specify the replacement. However, we are performing the replacement on the entire input CSV file! We can also add the hash sign for each line in the file, too.

In the final example, we briefly use AWK to show the power of this utility. In this example, we specify the delimiters (FS and OFS) and then we specify the fifth column alongside the gsub sub command in the AWK language to remove the column or field. Begin specifies the rules AWK shall use when parsing input and if there are multiple rules, the order received is the order executed.

Alternatively, we can print the first column or field using awk 'BEGIN { FS=","} { print $1}'  testdata/employees.csv and even the first occurrence by specifying NR==1 like this: awk ' BEGIN { FS=","} NR==1{ print $1}' . Specifying the number or returned records is very useful when using the grep command and copious amounts of matches are returned.

Again, there is so much you can do with AWK and SED. Combined with regular expressions (regexes), explanations and examples of all sorts of usage could fill a book dedicated to each command. You can check out the tools available in the documentation on the web so that you are aware of some platform differences.
主站蜘蛛池模板: 赣州市| 炉霍县| 广饶县| 古浪县| 双峰县| 东至县| 玉溪市| 调兵山市| 开阳县| 无为县| 荔浦县| 仁寿县| 榆树市| 法库县| 得荣县| 拉萨市| 阜平县| 涟源市| 怀宁县| 金堂县| 河北省| 金坛市| 神农架林区| 察隅县| 莎车县| 新和县| 东乌珠穆沁旗| 鄂托克前旗| 中超| 涪陵区| 宜宾市| 阿城市| 石河子市| 缙云县| 二连浩特市| 大埔区| 益阳市| 偏关县| 宜章县| 辽阳市| 会东县|