官术网_书友最值得收藏!

Picking up NLP basics while touring popular NLP libraries

After a short list of real-world applications of NLP, we'll be touring the essential stack of Python NLP libraries in this chapter. These packages handle a wide range of NLP tasks as mentioned previously as well as others such as sentiment analysis, text classification, and named entity recognition.

The most famous NLP libraries in Python include the Natural Language Toolkit (NLTK), spaCy, Gensim, and TextBlob. The scikit-learn library also has impressive NLP-related features. Let's take a look at the following popular NLP libraries in Python:

  • nltk: This library (http://www.nltk.org/) was originally developed for educational purposes and is now being widely used in industries as well. It is said that you can't talk about NLP without mentioning NLTK. It is one of the most famous and leading platforms for building Python-based NLP applications. You can install it simply by running the following command line in terminal:
sudo pip install -U nltk

If you're using conda, then execute the following command line:

conda install nltk
  • SpaCy: This library (https://spacy.io/) is a more powerful toolkit in the industry than NLTK. This is mainly for two reasons: one, spaCy is written in Cython, which is much more memory-optimized (now you see where the Cy in spaCy comes from) and excels in NLP tasks; second, spaCy keeps using state-of-the-art algorithms for core NLP problems, such as, convolutional neural network (CNN) models for tagging and name entity recognition. But it could seem advanced for beginners. In case you're interested, here's the installation instructions.

   Run the following command line in the terminal:

pip install -U spacy

For conda, execute the following command line:

conda install -c conda-forge spacy
  • Gensim: This library (https://radimrehurek.com/gensim/), developed by Radim Rehurek, has been gaining popularity over recent years. It was initially designed in 2008 to generate a list of similar articles given an article, hence the name of this library (generate similar—> Gensim). It was later drastically improved by Radim Rehurek in terms of its efficiency and scalability. Again, we can easily install it via pip by running the following command line:
pip install --upgrade gensim

In the case of conda, you can perform the following command line in terminal:

conda install -c conda-forge gensim 
You should make sure the dependencies, NumPy and SciPy, are already installed before gensim.
  • TextBlob: This library (https://textblob.readthedocs.io/en/dev/) is a relatively new one built on top of NLTK. It simplifies NLP and text analysis with easy-to-use built-in functions and methods, as well as wrappers around common tasks. We can install TextBlob by running the following command line in the terminal:
pip install -U textblob

TextBlob has some useful features that are not available in NLTK (currently), such as spell checking and correction, language detection, and translation.

主站蜘蛛池模板: 阳高县| 旬邑县| 肥城市| 华容县| 久治县| 孝义市| 桦川县| 会昌县| 武夷山市| 枣庄市| 南丹县| 隆回县| 宝坻区| 敦煌市| 岱山县| 湘西| 陆河县| 奈曼旗| 罗城| 砚山县| 镇江市| 依安县| 上犹县| 大新县| 和田市| 南江县| 长岭县| 鹤峰县| 泸州市| 北海市| 古丈县| 永善县| 平罗县| 玛沁县| 平果县| 汤原县| 论坛| 长宁区| 石狮市| 鲁山县| 大渡口区|