- Bash Cookbook
- Ron Brash Ganesh Naik
- 1276字
- 2021-07-23 19:17:35
How it works...
As you may have guessed, snooping through lots of data without regexes and wildcards can be a nightmare for the uninitiated. An even scarier one might even occur when your expressions don't use the correct terms or a valid (and accurate) expression to begin with. However, wildcards are quite useful on the command line when trying to craft strings together, find data quickly, and finding files. Sometimes, it the usability of the search result that's irrelevant if I'm merely looking to find the filename and rough location/line of a specific occurrence. For example, where is this CSS class in what file?
Well, you made it through the script and ran several commands to get a real-world idea of how to use regexes and wildcards at a surface level. Let's turn back the clock and walk through the recipe.
In step 1, we opened a console, created a simple script, and executed it. The output results were then displayed on the console:
$ bash test.sh
-------------------------------------------------
Linux-Journal-2017-08.pdf
Linux-Journal-2017-09.pdf
Linux-Journal-2017-10.pdf
Test.pdf
-------------------------------------------------
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9,test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9..test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9.test3
Desktop:
total 20428
drwxrwxr-x 2 rbrash rbrash 4096 Nov 15 12:55 book
# Lots of files here too
Documents:
total 0
Downloads:
total 552776
-rw------- 1 root root 1024 Feb 11 2017 ~
... # I have a lot of files for this book
Music:
total 0
Pictures:
total 2056
drwxrwxr-x 2 rbrash rbrash 4096 Sep 6 21:56 backgrounds
Public:
total 0
Templates:
total 0
Videos:
total 4
drwxrwxr-x 13 rbrash rbrash 4096 Aug 11 10:42 movies
-------------------------------------------------
a.test b.test c.test
-------------------------------------------------
a.test b.test c.test test.txt
-------------------------------------------------
, & .
-------------------------------------------------
C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3
It could be a lot scarier! Right? In the first line, we begin by chasing down some PDFs that start with a capital (uppercase) letter. The line ls * | grep [[:upper:]]*.pdf uses the ls command with a * wildcard (for everything) and then pipes the output into grep with a simple regex. The regex is [[:upper:]] followed by another * wildcard to combine the .pdf string. This produces our search results, which at a minimum will contain Test.pdf (my results returned PDFs for a popular Linux journal too!).
Then, we perform almost the same search using ls -l [[:upper:]]* , but using the ls directory with a regex will return a large amount of data (if all of the folders have contents). It begins in the current directory where the script is located, and then marches one directory deep and prints the contents. A neat feature is the use of the -l flag, which will produce long results and print the size of the directory in bytes.
Next, we use ls and look for all files beginning with a lowercase character and end with the .test extension. Little did you know, when you set up this recipe, you also saw wildcards and an expansion at work: touch {a..c}.test . The touch command created three files: a.test, b.test, and c.test. The ls command with this simple regex returns the names of the previous three files.
Again, we use the ls command with the (*), wildcard and expansion brackets to match for file extensions: ls *.{test,txt}. It searches for files with any name (*), which are then concatenated with a period (.), followed by either test or txt extensions.
Next, in step 7, we combined a few things we have learned using pipes, grep, xargs, and a regex in the command: echo "${STR1}" | grep -o [[:punct:]] | xargs echo. The fact that the output from grep will be in \n delimited form (new lines for each instance found), this will break our intention to have all of the values echoed to the console in this form and thereby we need xargs to fix the output into parameters echo can properly use. For example, echo "item1\n item2\n item3\n" will not work, but with xargs, it will look like: echo "item1" "item2" "item3".
And in the final command, we finally arrive at a crazier regex, which in truth is actually quite tame: ls | grep -E "([[:upper:]])([[:digit:]])?.test?." | tail -n 5. It introduces a couple of concepts, including groups (the parentheses), (?) wildcards, and how you can combine multiple expression components, and tail.
Using grep, the -E (expression flag), and two groups (the expressions inside of the parenthesis), we can combine them with the ? regex operator. This acts as a wildcard for a single character:
C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3
We can see that the last five results were returned, starting with a capital letter, followed by a number, a character (either . or ,), and then the word test and a number. We created one test file called Z9..test2. Notice how it was not included among the list items? This was because we did not use an expression like this:
$ ls | grep -E "([[:upper:]])([[:digit:]])?.?.test?"
In step 4, we run a particular regex using grep and -oP flags, grep -oP 'name="\K.*?(?=")' www.packtpub.com/index.html, on top of our recently crawled archive of www.packtpub.com. The -o flag means output only matching values, and -P is for using the Perl expressions.
Notice all of the values contained with double quotes? It's looking for any match that matches the pattern name="anythingGoesHere". It's certainly not extremely useful by itself, but it illustrates the point of being able to quickly get values (for example, what if the name was very specific? You could alter name= to another value and get the exact same result!).
Following along the same context, in step 5, we can also find all occurrences of name=: grep -P 'name=' www.packtpub.com/index.html. This type of command is useful for understanding the context of information or merely the existence of it; this comes back to the idea of looking for values in CSS, C/C++, and other data/source files.
Onward to step 6, we are looking for the title HTML tag. Normally, you should use a dedicated HTML parser, but if we wanted to use grep with regexes in a hurry—we can! The tr '\n' ' ' < www.packtpub.com/index.html | grep -o '<title>.*</title>' command uses the translate function (tr) to convert the \n or newline special character into an empty space. This is useful when data has a markup that may span multiple lines.
In our closing step, we end with a bit of fine-tuning when performing broad searches. We simply use grep to provide us with the line number and filename. Using cut, we an trim the remaining characters of the output on the console (this can be really useful):
$ grep -nHP 'name=' www.packtpub.com/index.html | cut -c -80
- 深入理解Android(卷I)
- 自己動手寫搜索引擎
- Beginning Java Data Structures and Algorithms
- Learning ArcGIS Pro 2
- Java開發入行真功夫
- Python Geospatial Development(Second Edition)
- 編譯系統透視:圖解編譯原理
- Kinect for Windows SDK Programming Guide
- Advanced Oracle PL/SQL Developer's Guide(Second Edition)
- Corona SDK Mobile Game Development:Beginner's Guide(Second Edition)
- Spring Security Essentials
- The Statistics and Calculus with Python Workshop
- Drupal Search Engine Optimization
- Clojure Web Development Essentials
- Mastering Bootstrap 4