官术网_书友最值得收藏!

Supervised learning

Oftentimes, we hear about feature engineering in the specific context of supervised learning, otherwise known as predictive analytics. Supervised learning algorithms specifically deal with the task of predicting a value, usually one of the attributes of the data, using the other attributes of the data. Take, for example, the dataset representing the network intrusion:

This is the same dataset as before, but let's dissect it further in the context of predictive analytics.

Notice that we have four attributes of this dataset: DateTime, Protocol, Urgent, and Malicious. Suppose now that the malicious attribute contains values that represent whether or not the observation was a malicious intrusion attempt. So in our very small dataset of four network connections, the first, second, and fourth connection were malicious attempts to intrude a network.

Suppose further that given this dataset, our task is to be able to take in three of the attributes (datetime, protocol, and urgent) and be able to accurately predict the value of malicious. In laymen’s terms, we want a system that can map the values of datetime, protocol, and urgent to the values in malicious. This is exactly how a supervised learning problem is set up:

Network_features = pd.DataFrame({'datetime': ['6/2/2018', '6/2/2018', '6/2/2018', '6/3/2018'], 'protocol': ['tcp', 'http', 'http', 'http'], 'urgent': [False, True, True, False]})
Network_response = pd.Series([True, True, False, True])
Network_features
>>
datetime protocol urgent 0 6/2/2018 tcp False 1 6/2/2018 http True 2 6/2/2018 http True 3 6/3/2018 http False
Network_response
>>
0 True 1 True 2 False 3 True dtype: bool

When we are working with supervised learning, we generally call the attribute (usually only one of them, but that is not necessary) of the dataset that we are attempting to predict the response of. The remaining attributes of the dataset are then called the features.

Supervised learning can also be considered the class of algorithms attempting to exploit the structure in data. By this, we mean that the machine learning algorithms try to extract patterns in usually very nice and neat data. As discussed earlier, we should not always expect data to come in tidy; this is where feature engineering comes in.

But if we are not predicting something, what good is machine learning you may ask? I’m glad you did. Before machine learning can exploit the structure of data, sometimes we have to alter or even create structure. That’s where unsupervised learning becomes a valuable tool.

主站蜘蛛池模板: 青州市| 苏州市| 宁夏| 陇南市| 自贡市| 霍邱县| 阜城县| 龙里县| 武汉市| 牙克石市| 中山市| 濮阳市| 客服| 韶山市| 太谷县| 荔波县| 东海县| 丹棱县| 汽车| 凤阳县| 禹州市| 夏邑县| 聂荣县| 奇台县| 昌江| 金昌市| 晋州市| 大庆市| 临高县| 青龙| 康平县| 明星| 色达县| 海淀区| 扶绥县| 三门县| 台东市| 平塘县| 太湖县| 大关县| 东光县|