官术网_书友最值得收藏!

How it works...

As you may have guessed, snooping through lots of data without regexes and wildcards can be a nightmare for the uninitiated. An even scarier one might even occur when your expressions don't use the correct terms or a valid (and accurate) expression to begin with. However, wildcards are quite useful on the command line when trying to craft strings together, find data quickly, and finding files. Sometimes, it the usability of the search result that's irrelevant if I'm merely looking to find the filename and rough location/line of a specific occurrence. For example, where is this CSS class in what file?

Well, you made it through the script and ran several commands to get a real-world idea of how to use regexes and wildcards at a surface level. Let's turn back the clock and walk through the recipe.

In step 1, we opened a console, created a simple script, and executed it. The output results were then displayed on the console:

$ bash test.sh 
-------------------------------------------------
Linux-Journal-2017-08.pdf
Linux-Journal-2017-09.pdf
Linux-Journal-2017-10.pdf
Test.pdf
-------------------------------------------------
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 A2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 B2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C0.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C1.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 C2.test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9,test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9..test2
-rw-rw-r-- 1 rbrash rbrash 0 Nov 15 22:13 Z9.test3

Desktop:
total 20428
drwxrwxr-x 2 rbrash rbrash 4096 Nov 15 12:55 book
# Lots of files here too

Documents:
total 0

Downloads:
total 552776
-rw------- 1 root root 1024 Feb 11 2017 ~
... # I have a lot of files for this book

Music:
total 0

Pictures:
total 2056
drwxrwxr-x 2 rbrash rbrash 4096 Sep 6 21:56 backgrounds

Public:
total 0

Templates:
total 0

Videos:
total 4
drwxrwxr-x 13 rbrash rbrash 4096 Aug 11 10:42 movies
-------------------------------------------------
a.test b.test c.test
-------------------------------------------------
a.test b.test c.test test.txt
-------------------------------------------------
, & .
-------------------------------------------------
C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3

It could be a lot scarier! Right? In the first line, we begin by chasing down some PDFs that start with a capital (uppercase) letter. The line ls * | grep [[:upper:]]*.pdf uses the ls command with a * wildcard (for everything) and then pipes the output into grep with a simple regex. The regex is [[:upper:]] followed by another * wildcard to combine the .pdf string. This produces our search results, which at a minimum will contain Test.pdf (my results returned PDFs for a popular Linux journal too!).

Then, we perform almost the same search using ls -l [[:upper:]]* , but using the ls directory with a regex will return a large amount of data (if all of the folders have contents). It begins in the current directory where the script is located, and then marches one directory deep and prints the contents. A neat feature is the use of the -l flag, which will produce long results and print the size of the directory in bytes.

Next, we use ls and look for all files beginning with a lowercase character and end with the .test extension. Little did you know, when you set up this recipe, you also saw wildcards and an expansion at work: touch {a..c}.test . The touch command created three files: a.test, b.test, and c.test. The ls command with this simple regex returns the names of the previous three files.

Again, we use the ls command with the (*), wildcard and expansion brackets to match for file extensions: ls *.{test,txt}. It searches for files with any name (*), which are then concatenated with a period (.), followed by either test or txt extensions.

Next, in step 7, we combined a few things we have learned using pipes, grep, xargs, and a regex in the command: echo "${STR1}" | grep -o [[:punct:]] | xargs echo. The fact that the output from grep will be in \n delimited form (new lines for each instance found), this will break our intention to have all of the values echoed to the console in this form and thereby we need xargs to fix the output into parameters echo can properly use. For example, echo "item1\n item2\n item3\n" will not work, but with xargs, it will look like: echo "item1"  "item2" "item3".

And in the final command, we finally arrive at a crazier regex, which in truth is actually quite tame: ls | grep -E "([[:upper:]])([[:digit:]])?.test?." | tail -n 5. It introduces a couple of concepts, including groups (the parentheses), (?) wildcards, and how you can combine multiple expression components, and tail.

Using grep, the -E (expression flag), and two groups (the expressions inside of the parenthesis), we can combine them with the ? regex operator. This acts as a wildcard for a single character:

C0.test2
C1.test2
C2.test2
Z9,test2
Z9.test3

We can see that the last five results were returned, starting with a capital letter, followed by a number, a character (either . or ,), and then the word test and a number. We created one test file called Z9..test2. Notice how it was not included among the list items? This was because we did not use an expression like this: 

$ ls | grep -E "([[:upper:]])([[:digit:]])?.?.test?"

In step 4, we run a particular regex using grep and -oP flags, grep -oP 'name="\K.*?(?=")' www.packtpub.com/index.html, on top of our recently crawled archive of www.packtpub.com. The -o flag means output only matching values, and -P is for using the Perl expressions.

Notice all of the values contained with double quotes? It's looking for any match that matches the pattern name="anythingGoesHere". It's certainly not extremely useful by itself, but it illustrates the point of being able to quickly get values (for example, what if the name was very specific? You could alter name= to another value and get the exact same result!).

Following along the same context, in step 5, we can also find all occurrences of name=grep -P 'name=' www.packtpub.com/index.html. This type of command is useful for understanding the context of information or merely the existence of it; this comes back to the idea of looking for values in CSS, C/C++, and other data/source files.

Onward to step 6, we are looking for the title HTML tag. Normally, you should use a dedicated HTML parser, but if we wanted to use grep with regexes in a hurrywe can! The tr '\n' ' ' < www.packtpub.com/index.html | grep -o '<title>.*</title>' command uses the translate function (tr) to convert the \n or newline special character into an empty space. This is useful when data has a markup that may span multiple lines.

In our closing step, we end with a bit of fine-tuning when performing broad searches. We simply use grep to provide us with the line number and filename. Using cut, we an trim the remaining characters of the output on the console (this can be really useful):

$ grep -nHP 'name=' www.packtpub.com/index.html | cut -c -80
Regexes can also be tested online using a number of regex simulators! One popular and free tool available online is: https://regexr.com/.
Don't forget that some regex functionalities also allow you to nest commands within groups! We didn't demonstrate this functionality, but it exists with acceptable results in some use cases!
主站蜘蛛池模板: 外汇| 广汉市| 保山市| 原平市| 芒康县| 子洲县| 多伦县| 垣曲县| 邵阳市| 庆云县| 广灵县| 阿克苏市| 巨野县| 孝义市| 名山县| 中牟县| 沧州市| 巨鹿县| 文山县| 五河县| 天水市| 丹巴县| 台州市| 东源县| 宿松县| 苏尼特右旗| 阿荣旗| 伊金霍洛旗| 响水县| 钟山县| 延边| 乌兰察布市| 濉溪县| 同心县| 三江| 保康县| 正宁县| 曲阜市| 吉安市| 多伦县| 海南省|