- The Natural Language Processing Workshop
- Rohan Chopra Aniruddha M. Godbole Nipun Sadvilkar Muzaffar Bashir Shah Sohom Ghosh Dwight Gunning
- 185字
- 2021-06-11 18:39:24
Introduction
In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also took a quick look at various preprocessing steps. In this chapter, we will learn how to make text understandable to machine learning algorithms.
As we know, to use a machine learning algorithm on textual data, we need a numerical or vector representation of text data since most of these algorithms are unable to work directly with plain text or strings. But before converting the text data into numerical form, we will need to pass it through some preprocessing steps such as tokenization, stemming, lemmatization, and stop-word removal.
So, in this chapter, we will learn a little bit more about these preprocessing steps and how to extract features from the preprocessed text and convert them into vectors. We will also explore two popular methods for feature extraction (Bag of Words and Term Frequency-Inverse Document Frequency), as well as various methods for finding similarity between different texts. By the end of this chapter, you will have gained an in-depth understanding of how text data can be visualized.
- 我們都是數據控:用大數據改變商業、生活和思維方式
- ETL數據整合與處理(Kettle)
- Creating Mobile Apps with Sencha Touch 2
- Python數據分析、挖掘與可視化從入門到精通
- MySQL從入門到精通(第3版)
- 大數據時代下的智能轉型進程精選(套裝共10冊)
- Learn Unity ML-Agents:Fundamentals of Unity Machine Learning
- 深度剖析Hadoop HDFS
- 大數據架構和算法實現之路:電商系統的技術實戰
- Hands-On Mathematics for Deep Learning
- 深入淺出Greenplum分布式數據庫:原理、架構和代碼分析
- Apache Kylin權威指南
- Mastering LOB Development for Silverlight 5:A Case Study in Action
- 菜鳥學SPSS數據分析
- SOLIDWORKS 2018中文版機械設計基礎與實例教程