書(shū)名： Keras 2.x Projects
作者名： Giuseppe Ciaburro
本章字?jǐn)?shù)： 826字
更新時(shí)間： 2021-07-02 14:36:19

Pattern recognition using a Keras neural network

Heart diseases are often underestimated, but, in reality, they are the leading cause of death in the world. Among them, coronary artery disease (CAD) accounts for about a third of all deaths worldwide in people over 35 years of age. CAD is the result of arteriosclerosis, which consists in the narrowing of the blood vessels and the hardening of its walls. In some cases, CAD can completely block the influx of oxygen-rich blood to the heart muscle, causing a heart attack.

CAD is caused by an accumulation of waxy grease deposits on the inner walls of the arteries. These deposits consist of cholesterol, calcium, and other substances that travel in the blood; the product of their accumulation is called atherosclerotic plaque. This plaque can clog the coronary arteries and make them rigid and irregular, causing the so-called hardening of the arteries or atherosclerosis. These obstructions can be single or multiple and present various levels of gravity and different locations. Gradually, the deposits restrict the lumen of the coronary arteries, thus reducing the supply of blood and oxygen to the heart muscle. This reduction in blood flow can cause chest pain (angina), difficulty in breathing (dyspnoea), and other symptoms, while complete obstruction can induce a heart attack.

Coronary angiography is used to diagnose CAD. Angiography is the diagnostic representation of the blood or lymphatic vessels of the human body through a technique that involves the infusion of a water soluble contrast agent within the vessels and the generation of medical images through various biomedical imaging techniques.

In this example, we will try to predict a condition of heart disease through a classification algorithm based on neural networks. To do this, we will use the Heart Disease Data Set, which is available in the UCI Machine Learning Repository.

The UCI Machine Learning Repository is available at the following link: http://mlr.cs.umass.edu/ml/datasets.html.

These databases contain several pieces of data information on heart disease instances. These are provided by the following four clinical institutions: Cleveland Clinic Foundation (CCF), Hungarian Institute of Cardiology (HIC), Long Beach Medical Center (LBMC), and University Hospital in Switzerland (SUH).

More specifically, we will refer to the data that was made available by the CCF (edited by Robert Detrano, MD, PhD). This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. The goal is to predict the presence of heart disease in the patient. The target is an integer value from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1, 2, 3, 4) from absence (value 0).

The following list shows all the variables, followed by a brief description:

Number of instances: 302
Number of attributes: 14 continuous attributes (including the class attribute HeartDisease)

Each of the attributes are detailed as follows:

age: Age in years
sex: Sex (1 = male; 0 = female)
cp: Chest pain type (Value 1: typical angina; Value 2: atypical angina; Value 3: non-anginal pain, Value 4: asymptomatic)
trestbps: Resting blood pressure (in mm Hg on admission to the hospital)
chol: Serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: Resting electrocardiographic results (Value 0: normal; Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria)
thalach: Maximum heart rate achieved
exang: Exercise induced angina (1 = yes; 0 = no)
oldpeak: ST depression induced by exercise relative to rest
slope: The slope of the peak exercise ST segment (Value 1: upsloping; Value 2: flat; Value 3: downsloping)
ca: Number of major vessels (0-3) colored by flourosopy
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
HeartDisease: Diagnosis of heart disease – angiographic disease status (Value 0: < 50% diameter narrowing; Value 1: > 50% diameter narrowing)—in any major vessel: attributes 59 through 68 are vessels

The data is available in a .xlsx file named ClevelandData.xlsx, which can be downloaded from the UCI dataset. To make our job easier, the target has been reworked to present only two values (0 and 1). To start, let's look at how we can import the data into Python. To do this, we will use the read_excel module of the pandas library. The read_ excel method reads an Excel table into a pandas DataFrame. The first thing to do is import the library that we will use:

import pandas as pd

The available data does not contain the header, so it is necessary to retrieve the names of the variables that are contained in another file, which is always available in the UCI archive. Let's put them in a list:

HDNames= ['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','hal',' HeartDisease ']

Now let's import the data contained in the dataset in Python:

Data = pd.read_excel('ClevelandData.xlsx', names=HDNames)

Two parameters are passed: filename, and the list of column names to use.

官术网_书友最值得收藏!

Keras 2.x Projects

Pattern recognition using a Keras neural network