- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 308字
- 2021-06-25 22:45:53
An example of unstructured data – server logs
As an example of unstructured data, we have pulled some sample server logs from a public source and included them in a text document. We can take a glimpse of what this unstructured data looks like, so we can recognize it in the future:
# Import our data manipulation tool, Pandas
import pandas as pd
# Create a pandas DataFrame from some unstructured Server Logs
logs = pd.read_table('../data/server_logs.txt', header=None, names=['Info'])
# header=None, specifies that the first line of data is the first data point, not a column name
# names=['Info] is me setting the column name in our DataFrame for easier access
We created a DataFrame in pandas called logs that hold our server logs. To take a look, let's call the .head() method to look at the first few rows:
# Look at the first 5 rows
logs.head()
This will show us a table of the first 5 rows in our logs DataFrame as follows:

We can see in our logs that each row represents a single log and there is only a single column, the text of the log itself. Not exactly a characteristic or anything, just the raw log is taken directly from the server. This is a great example of unstructured data. Most often, data in the form of text is usually unstructured.
It is important to recognize that most unstructured data can be transformed into structured data through a few manipulations, but this is something that we will tackle in the next chapter.
Most of the data that we will be working on the book will be structured. That means that there will be a sense of rows and columns. Given this, we can start to look at the types of values in the cells of our tabular data.
- 企業(yè)數(shù)字化創(chuàng)新引擎:企業(yè)級PaaS平臺(tái)HZERO
- ETL數(shù)據(jù)整合與處理(Kettle)
- 輕松學(xué)大數(shù)據(jù)挖掘:算法、場景與數(shù)據(jù)產(chǎn)品
- InfluxDB原理與實(shí)戰(zhàn)
- 數(shù)據(jù)庫開發(fā)實(shí)踐案例
- Mastering Machine Learning with R(Second Edition)
- 數(shù)據(jù)革命:大數(shù)據(jù)價(jià)值實(shí)現(xiàn)方法、技術(shù)與案例
- 數(shù)據(jù)庫原理與設(shè)計(jì)(第2版)
- HikariCP連接池實(shí)戰(zhàn)
- 跨領(lǐng)域信息交換方法與技術(shù)(第二版)
- Visual FoxPro數(shù)據(jù)庫技術(shù)基礎(chǔ)
- Mastering ROS for Robotics Programming(Second Edition)
- 數(shù)據(jù)中心經(jīng)營之道
- Arquillian Testing Guide
- 推薦系統(tǒng)全鏈路設(shè)計(jì):原理解讀與業(yè)務(wù)實(shí)踐