官术网_书友最值得收藏!

Using regular expressions (regex)

There are times when an engineer wants to parse specific data from a sentence or a big chunk of data. Regex is the best tool of the trade for this purpose. Regex is a common concept in every programming language, with the only difference being the syntax in each programming language.

The following example shows how to use regex in Python:

import re
sample="From Jan 2018 till Nov 2018 I was learning python daily at 10:00 PM"

# '\W+' represents Non-Alphanumeric characters or group of characters
print(re.split('\W+', sample))

#Extract only the month and Year from the string and print it
regex=re.compile('(?P<month>\w{3})\s+(?P<year>[0-9]{4})')

for m in regex.finditer(sample):
value=m.groupdict()
print ("Month: "+value['month']+" , "+"Year: "+value['year'])

# to extract the time with AM or PM addition
regex=re.compile('\d+:\d+\s[AP]M')
m=re.findall(regex,sample)
print (m)

The sample output is as follows:

>
['From', 'Jan', '2018', 'till', 'Nov', '2018', 'I', 'was', 'learning', 'python', 'daily', 'at', '10', '00', 'PM']
Month: Jan , Year: 2018
Month: Nov , Year: 2018
['10:00 PM']

As we can see in the preceding output, the first line of code, is a simple sentence split into separate words. The other output is a regex in a loop, which extracts all the months and years depicted by three characters (mmm) and four digits (yyyy). Finally, in the last line of code, a time extraction (extracting a time value using regex) is performed, based upon AM/PM in the hh:mm format.

There can be multiple variations that we can work with using regex. It would be beneficial to refer to online tutorials for detailed insight into the different types of regex and how to use the right one to extract information.
主站蜘蛛池模板: 长岭县| 佛山市| 湘潭市| 涞源县| 孝感市| 潞城市| 青铜峡市| 北辰区| 乐东| 鸡泽县| 宿迁市| 静乐县| 永吉县| 石阡县| 兴海县| 霸州市| 抚顺县| 阜南县| 庆安县| 阿城市| 高清| 涞水县| 信阳市| 苗栗市| 汕尾市| 江川县| 昭通市| 明溪县| 芜湖县| 腾冲县| 且末县| 松桃| 红原县| 通城县| 江门市| 曲阜市| 南投县| 建始县| 谷城县| 邢台县| 海林市|