- Keras 2.x Projects
- Giuseppe Ciaburro
- 502字
- 2021-07-02 14:36:19
Exploratory analysis
Before starting with data analysis through the classification algorithm, we will conduct an exploratory analysis to understand how the data is distributed and extract preliminary knowledge. To display the first twenty rows of the DataFrame that's been imported, we can use the head() function, as follows:
print(data.head(20))
The following results are returned:

The first 20 rows are displayed. This function returns the first n rows for the object, based on position. This is useful for quickly testing whether your object has the right type of data in it. Now the dataset is available in our Python environment. To extract some information, we can invoke the info() function, as follows:
print(Data.info())
This method prints a concise summary of a DataFrame, including the dtypes index and dtypes column, non-null values, and memory usage. The following results are returned:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302 entries, 0 to 301
Data columns (total 14 columns):
age 302 non-null int64
sex 302 non-null int64
cp 302 non-null int64
trestbps 302 non-null int64
chol 302 non-null int64
fbs 302 non-null int64
restecg 302 non-null int64
thalach 302 non-null int64
exang 302 non-null int64
oldpeak 302 non-null float64
slope 302 non-null int64
ca 302 non-null object
hal 302 non-null object
HeartDisease 302 non-null int64
dtypes: float64(1), int64(11), object(2)
memory usage: 33.1+ KB
None
Useful information is reported. The number of entries is 302, and the number of data columns is 14. Essentially, with regard to the list of all features with the number of elements, the possible presence of data and the type is returned. In this way, we can already get an idea of the type of variables we are about to analyze. In fact, analyzing the results that we've obtained, we can note that three types have been identified: float64(1), int64(11), and object(2). For the first two, there are no doubts: these are integer and real numbers. This anomaly is represented by the two columns labeled as objects. To understand what happened, it is useful to check the types of data provided by the pandas library, as shown in the following table:

Now, everything is clear: the two columns have been labeled as containing text. Why did this happen? This problem is due to the presence of missing values. Keep this in mind, as we will have to deal with this problem before proceeding with the construction of the model.
To get a preview of the data contained in it, we can calculate a series of basic statistics. To do so, we will use the describe() function in the following way:
summary = Data.describe()
print(summary)
The following results are returned:

The describe() function generates descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values. It analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary, depending on what is provided. To continue, it is therefore necessary to address the problem of missing values.
- 會聲會影X5視頻剪輯高手速成
- 西門子S7-200 SMART PLC從入門到精通
- 數據庫原理與應用技術學習指導
- VMware Performance and Capacity Management(Second Edition)
- 大數據技術與應用
- 數據庫系統原理及應用教程(第5版)
- 網絡化分布式系統預測控制
- 運動控制系統應用與實踐
- 悟透AutoCAD 2009案例自學手冊
- 電子設備及系統人機工程設計(第2版)
- Mastering Predictive Analytics with scikit:learn and TensorFlow
- MongoDB 4 Quick Start Guide
- 深度學習之模型優化:核心算法與案例實踐
- Internet of Things with Raspberry Pi 3
- 單片機原理、接口及應用系統設計