官术网_书友最值得收藏!

  • Keras 2.x Projects
  • Giuseppe Ciaburro
  • 826字
  • 2021-07-02 14:36:19

Pattern recognition using a Keras neural network

Heart diseases are often underestimated, but, in reality, they are the leading cause of death in the world. Among them, coronary artery disease (CAD) accounts for about a third of all deaths worldwide in people over 35 years of age. CAD is the result of arteriosclerosis, which consists in the narrowing of the blood vessels and the hardening of its walls. In some cases, CAD can completely block the influx of oxygen-rich blood to the heart muscle, causing a heart attack.

CAD is caused by an accumulation of waxy grease deposits on the inner walls of the arteries. These deposits consist of cholesterol, calcium, and other substances that travel in the blood; the product of their accumulation is called atherosclerotic plaque. This plaque can clog the coronary arteries and make them rigid and irregular, causing the so-called hardening of the arteries or atherosclerosis. These obstructions can be single or multiple and present various levels of gravity and different locations. Gradually, the deposits restrict the lumen of the coronary arteries, thus reducing the supply of blood and oxygen to the heart muscle. This reduction in blood flow can cause chest pain (angina), difficulty in breathing (dyspnoea), and other symptoms, while complete obstruction can induce a heart attack.

Coronary angiography is used to diagnose CAD. Angiography is the diagnostic representation of the blood or lymphatic vessels of the human body through a technique that involves the infusion of a water soluble contrast agent within the vessels and the generation of medical images through various biomedical imaging techniques.

In this example, we will try to predict a condition of heart disease through a classification algorithm based on neural networks. To do this, we will use the Heart Disease Data Set, which is available in the UCI Machine Learning Repository.

The UCI Machine Learning Repository is available at the following link: http://mlr.cs.umass.edu/ml/datasets.html.

These databases contain several pieces of data information on heart disease instances. These are provided by the following four clinical institutions: Cleveland Clinic Foundation (CCF), Hungarian Institute of Cardiology (HIC), Long Beach Medical Center (LBMC), and University Hospital in Switzerland (SUH).

More specifically, we will refer to the data that was made available by the CCF (edited by Robert Detrano, MD, PhD). This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. The goal is to predict the presence of heart disease in the patient. The target is an integer value from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1, 2, 3, 4) from absence (value 0).

The following list shows all the variables, followed by a brief description:

  • Number of instances: 302
  • Number of attributes: 14 continuous attributes (including the class attribute HeartDisease)

Each of the attributes are detailed as follows:

  • age: Age in years
  • sex: Sex (1 = male; 0 = female)
  • cp: Chest pain type (Value 1: typical angina; Value 2: atypical angina; Value 3: non-anginal pain, Value 4: asymptomatic)
  • trestbps: Resting blood pressure (in mm Hg on admission to the hospital)
  • chol: Serum cholestoral in mg/dl
  • fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
  • restecg: Resting electrocardiographic results (Value 0: normal; Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria)
  • thalach: Maximum heart rate achieved
  • exang: Exercise induced angina (1 = yes; 0 = no)
  • oldpeak: ST depression induced by exercise relative to rest
  • slope: The slope of the peak exercise ST segment (Value 1: upsloping; Value 2: flat; Value 3: downsloping)
  • ca: Number of major vessels (0-3) colored by flourosopy
  • thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • HeartDisease: Diagnosis of heart disease  angiographic disease status (Value 0: < 50% diameter narrowing; Value 1: > 50% diameter narrowing)—in any major vessel: attributes 59 through 68 are vessels

The data is available in a .xlsx file named ClevelandData.xlsx, which can be downloaded from the UCI dataset. To make our job easier, the target has been reworked to present only two values (0 and 1). To start, let's look at how we can import the data into Python. To do this, we will use the read_excel module of the pandas library. The read_ excel method reads an Excel table into a pandas DataFrame. The first thing to do is import the library that we will use:

import pandas as pd

The available data does not contain the header, so it is necessary to retrieve the names of the variables that are contained in another file, which is always available in the UCI archive. Let's put them in a list:

HDNames= ['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','hal',' HeartDisease ']

Now let's import the data contained in the dataset in Python:

Data = pd.read_excel('ClevelandData.xlsx', names=HDNames)

Two parameters are passed: filename, and the list of column names to use.

主站蜘蛛池模板: 星子县| 盐亭县| 新化县| 甘肃省| 玉门市| 保康县| 太湖县| 襄樊市| 天津市| 乌拉特后旗| 常熟市| 玉龙| 贡嘎县| 高州市| 宁阳县| 寿阳县| 贵州省| 昆山市| 临桂县| 沛县| 巴马| 当阳市| 云龙县| 明溪县| 临夏县| 依安县| 洱源县| 阿克陶县| 齐河县| 大理市| 治多县| 定州市| 康平县| 萝北县| 朝阳市| 紫阳县| 邢台县| 巴林左旗| 郓城县| 梁平县| 江川县|