- Neural Networks with Keras Cookbook
- V Kishore Ayyadevara
- 248字
- 2021-07-02 12:46:33
Getting ready
To understand the intuition of performing text analysis, let's consider the Reuters dataset, where each news article is classified into one of the 46 possible topics.
We will adopt the following strategy to perform our analysis:
- Given that a dataset could contain thousands of unique words, we will shortlist the words that we shall consider.
- For this specific exercise, we shall consider the top 10,000 most frequent words.
- An alternative approach would be to consider the words that cumulatively constitute 80% of all words within a dataset. This ensures that all the rare words are excluded.
- Once the words are shortlisted, we shall one-hot-encode the article based on the constituent frequent words.
- Similarly, we shall one-hot-encode the output label.
- Each input now is a 10,000-dimensional vector, and the output is a 46-dimensional vector:
- We will divide the dataset into train and test datasets. However, in code, you will notice that we will be using the in-built dataset of reuters in Keras that has built-in function to identify the top n frequent words and split the dataset into train and test datasets.
- Map the input and output with a hidden layer in between.
- We will perform softmax at the output layer to obtain the probability of the input belonging to one of the 46 classes.
- Given that we have multiple possible outputs, we shall employ a categorical cross entropy loss function.
- We shall compile and fit the model to measure its accuracy on a test dataset.
推薦閱讀
- 軟件安全技術(shù)
- Oracle 12c中文版數(shù)據(jù)庫管理、應(yīng)用與開發(fā)實踐教程 (清華電腦學堂)
- C++新經(jīng)典
- C#程序設(shè)計教程(第3版)
- 速學Python:程序設(shè)計從入門到進階
- HoloLens與混合現(xiàn)實開發(fā)
- GameMaker Essentials
- Spring技術(shù)內(nèi)幕:深入解析Spring架構(gòu)與設(shè)計原理(第2版)
- 零基礎(chǔ)學Python編程(少兒趣味版)
- 工業(yè)機器人離線編程
- SwiftUI極簡開發(fā)
- Learning C++ by Creating Games with UE4
- Android智能手機APP界面設(shè)計實戰(zhàn)教程
- JavaScript前端開發(fā)基礎(chǔ)教程
- 實驗編程:PsychoPy從入門到精通