官术网_书友最值得收藏!

Natural Language Processing

Processing natural language texts is very complex, they are not very well structured and require a lot of cleaning and normalizing. Yet the amount of textual information around us is tremendous: a lot of text data is generated every minute, and it is very hard to retrieve useful information from them. Using data science and machine learning is very helpful for text problems as well; they allow us to find the right text, process it, and extract the valuable bits of information.

There are multiple ways we can use the text information. One example is information retrieval, or, simply, text search--given a user query and a collection of documents, we want to find what are the most relevant documents in the corpus with respect to the query, and present them to the user. Other applications include sentiment analysis--predicting whether a product review is positive, neutral or negative, or grouping the reviews according to how they talk about the products. 

We will talk more about information retrieval, Natural Language Processing (NLP) and working with texts in Chapter 6, Working with Text - Natural Language Processing and Information Retrieval. Additionally, we will see how to process large amounts of text data in Chapter 9Scaling Data Science.  

The methods we can use for machine learning and data science are very important. What is equally important is the the way we create them and then put them to use in production systems. Data science process models help us make it more organized and systematic, which is why we will talk about them next.

主站蜘蛛池模板: 和顺县| 宁南县| 固阳县| 临武县| 南昌县| 蓝山县| 盘锦市| 体育| 山西省| 孟津县| 福清市| 潜江市| 瓦房店市| 洞口县| 东山县| 且末县| 清涧县| 凤山市| 清水县| 临武县| 汉源县| 鄂托克前旗| 家居| 蒙城县| 丹江口市| 镇沅| 玉山县| 靖安县| 喀喇| 鄂托克前旗| 眉山市| 民乐县| 阿合奇县| 衡东县| 砚山县| 甘洛县| 陇西县| 屯留县| 海丰县| 虹口区| 北宁市|