官术网_书友最值得收藏!

How it works...

As you may have guessed, snooping through lots of data without regexes and wildcards can be a nightmare for the uninitiated. An even scarier one might even occur when your expressions don't use the correct terms or a valid (and accurate) expression to begin with. However, wildcards are quite useful on the command line when trying to craft strings together, find data quickly, and finding files. Sometimes, it the usability of the search result that's irrelevant if I'm merely looking to find the filename and rough location/line of a specific occurrence. For example, where is this CSS class in what file?

Well, you made it through the script and ran several commands to get a real-world idea of how to use regexes and wildcards at a surface level. Let's turn back the clock and walk through the recipe.

In step 1, we opened a console, created a simple script, and executed it. The output results were then displayed on the console:

$ bash test.sh 
-------------------------------------------------
Linux-Journal-2017-08.pdf
Linux-Journal-2017-09.pdf
Linux-Journal-2017-10.pdf
Test.pdf
-------------------------------------------------
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9,test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9..test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9.test3

Desktop:
total 20428
drwxrwxr-x 2 rbrash rbrash 4096 Nov 15 12:55 book
# Lots of files here too

Documents:
total 0

Downloads:
total 552776
-rw------- 1 root root 1024 Feb 11 2017 ~
... # I have a lot of files for this book

Music:
total 0

Pictures:
total 2056
drwxrwxr-x 2 rbrash rbrash 4096 Sep 6 21:56 backgrounds

Public:
total 0

Templates:
total 0

Videos:
total 4
drwxrwxr-x 13 rbrash rbrash 4096 Aug 11 10:42 movies
-------------------------------------------------
a.test b.test c.test
-------------------------------------------------
a.test b.test c.test test.txt
-------------------------------------------------
, & .
-------------------------------------------------
C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3

It could be a lot scarier! Right? In the first line, we begin by chasing down some PDFs that start with a capital (uppercase) letter. The line ls * | grep [[:upper:]]*.pdf uses the ls command with a * wildcard (for everything) and then pipes the output into grep with a simple regex. The regex is [[:upper:]] followed by another * wildcard to combine the .pdf string. This produces our search results, which at a minimum will contain Test.pdf (my results returned PDFs for a popular Linux journal too!).

Then, we perform almost the same search using ls -l [[:upper:]]* , but using the ls directory with a regex will return a large amount of data (if all of the folders have contents). It begins in the current directory where the script is located, and then marches one directory deep and prints the contents. A neat feature is the use of the -l flag, which will produce long results and print the size of the directory in bytes.

Next, we use ls and look for all files beginning with a lowercase character and end with the .test extension. Little did you know, when you set up this recipe, you also saw wildcards and an expansion at work: touch {a..c}.test . The touch command created three files: a.test, b.test, and c.test. The ls command with this simple regex returns the names of the previous three files.

Again, we use the ls command with the (*), wildcard and expansion brackets to match for file extensions: ls *.{test,txt}. It searches for files with any name (*), which are then concatenated with a period (.), followed by either test or txt extensions.

Next, in step 7, we combined a few things we have learned using pipes, grep, xargs, and a regex in the command: echo "${STR1}" | grep -o [[:punct:]] | xargs echo. The fact that the output from grep will be in \n delimited form (new lines for each instance found), this will break our intention to have all of the values echoed to the console in this form and thereby we need xargs to fix the output into parameters echo can properly use. For example, echo "item1\n item2\n item3\n" will not work, but with xargs, it will look like: echo "item1"  "item2" "item3".

And in the final command, we finally arrive at a crazier regex, which in truth is actually quite tame: ls | grep -E "([[:upper:]])([[:digit:]])?.test?." | tail -n 5. It introduces a couple of concepts, including groups (the parentheses), (?) wildcards, and how you can combine multiple expression components, and tail.

Using grep, the -E (expression flag), and two groups (the expressions inside of the parenthesis), we can combine them with the ? regex operator. This acts as a wildcard for a single character:

C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3

We can see that the last five results were returned, starting with a capital letter, followed by a number, a character (either . or ,), and then the word test and a number. We created one test file called Z9..test2. Notice how it was not included among the list items? This was because we did not use an expression like this: 

$ ls | grep -E "([[:upper:]])([[:digit:]])?.?.test?"

In step 4, we run a particular regex using grep and -oP flags, grep -oP 'name="\K.*?(?=")' www.packtpub.com/index.html, on top of our recently crawled archive of www.packtpub.com. The -o flag means output only matching values, and -P is for using the Perl expressions.

Notice all of the values contained with double quotes? It's looking for any match that matches the pattern name="anythingGoesHere". It's certainly not extremely useful by itself, but it illustrates the point of being able to quickly get values (for example, what if the name was very specific? You could alter name= to another value and get the exact same result!).

Following along the same context, in step 5, we can also find all occurrences of name=grep -P 'name=' www.packtpub.com/index.html. This type of command is useful for understanding the context of information or merely the existence of it; this comes back to the idea of looking for values in CSS, C/C++, and other data/source files.

Onward to step 6, we are looking for the title HTML tag. Normally, you should use a dedicated HTML parser, but if we wanted to use grep with regexes in a hurrywe can! The tr '\n' ' ' < www.packtpub.com/index.html | grep -o '<title>.*</title>' command uses the translate function (tr) to convert the \n or newline special character into an empty space. This is useful when data has a markup that may span multiple lines.

In our closing step, we end with a bit of fine-tuning when performing broad searches. We simply use grep to provide us with the line number and filename. Using cut, we an trim the remaining characters of the output on the console (this can be really useful):

$ grep -nHP 'name=' www.packtpub.com/index.html | cut -c -80
Regexes can also be tested online using a number of regex simulators! One popular and free tool available online is: https://regexr.com/.
Don't forget that some regex functionalities also allow you to nest commands within groups! We didn't demonstrate this functionality, but it exists with acceptable results in some use cases!
主站蜘蛛池模板: 呼玛县| 松原市| 苍溪县| 北票市| 依安县| 萝北县| 东乡族自治县| 乾安县| 平舆县| 广丰县| 丰宁| 垫江县| 株洲市| 高邮市| 关岭| 南阳市| 香河县| 玉溪市| 桐庐县| 玛纳斯县| 常山县| 聊城市| 仲巴县| 涪陵区| 永吉县| 全州县| 通江县| 玉溪市| 怀集县| 锦屏县| 阜平县| 新余市| 桃江县| 甘孜县| 玉门市| 阿图什市| 泽普县| 麻江县| 垣曲县| 阜新| 兴山县|