官术网_书友最值得收藏!

Natural Language Processing

Processing natural language texts is very complex, they are not very well structured and require a lot of cleaning and normalizing. Yet the amount of textual information around us is tremendous: a lot of text data is generated every minute, and it is very hard to retrieve useful information from them. Using data science and machine learning is very helpful for text problems as well; they allow us to find the right text, process it, and extract the valuable bits of information.

There are multiple ways we can use the text information. One example is information retrieval, or, simply, text search--given a user query and a collection of documents, we want to find what are the most relevant documents in the corpus with respect to the query, and present them to the user. Other applications include sentiment analysis--predicting whether a product review is positive, neutral or negative, or grouping the reviews according to how they talk about the products. 

We will talk more about information retrieval, Natural Language Processing (NLP) and working with texts in Chapter 6, Working with Text - Natural Language Processing and Information Retrieval. Additionally, we will see how to process large amounts of text data in Chapter 9Scaling Data Science.  

The methods we can use for machine learning and data science are very important. What is equally important is the the way we create them and then put them to use in production systems. Data science process models help us make it more organized and systematic, which is why we will talk about them next.

主站蜘蛛池模板: 崇阳县| 潼关县| 正镶白旗| 兴业县| 栾川县| 左权县| 赤水市| 无为县| 探索| 柳江县| 阜城县| 昔阳县| 平湖市| 中方县| 稷山县| 青田县| 甘德县| 东明县| 昌吉市| 普陀区| 明星| 旬邑县| 象州县| 冷水江市| 新沂市| 孟津县| 边坝县| 福海县| 女性| 阿拉善盟| 十堰市| 仙游县| 东至县| 朝阳县| 锡林郭勒盟| 郯城县| 武城县| 馆陶县| 镇江市| 房山区| 上虞市|