官术网_书友最值得收藏!

Picking up NLP basics while touring popular NLP libraries

After a short list of real-world applications of NLP, we'll be touring the essential stack of Python NLP libraries in this chapter. These packages handle a wide range of NLP tasks as mentioned previously as well as others such as sentiment analysis, text classification, and named entity recognition.

The most famous NLP libraries in Python include the Natural Language Toolkit (NLTK), spaCy, Gensim, and TextBlob. The scikit-learn library also has impressive NLP-related features. Let's take a look at the following popular NLP libraries in Python:

  • nltk: This library (http://www.nltk.org/) was originally developed for educational purposes and is now being widely used in industries as well. It is said that you can't talk about NLP without mentioning NLTK. It is one of the most famous and leading platforms for building Python-based NLP applications. You can install it simply by running the following command line in terminal:
sudo pip install -U nltk

If you're using conda, then execute the following command line:

conda install nltk
  • SpaCy: This library (https://spacy.io/) is a more powerful toolkit in the industry than NLTK. This is mainly for two reasons: one, spaCy is written in Cython, which is much more memory-optimized (now you see where the Cy in spaCy comes from) and excels in NLP tasks; second, spaCy keeps using state-of-the-art algorithms for core NLP problems, such as, convolutional neural network (CNN) models for tagging and name entity recognition. But it could seem advanced for beginners. In case you're interested, here's the installation instructions.

   Run the following command line in the terminal:

pip install -U spacy

For conda, execute the following command line:

conda install -c conda-forge spacy
  • Gensim: This library (https://radimrehurek.com/gensim/), developed by Radim Rehurek, has been gaining popularity over recent years. It was initially designed in 2008 to generate a list of similar articles given an article, hence the name of this library (generate similar—> Gensim). It was later drastically improved by Radim Rehurek in terms of its efficiency and scalability. Again, we can easily install it via pip by running the following command line:
pip install --upgrade gensim

In the case of conda, you can perform the following command line in terminal:

conda install -c conda-forge gensim 
You should make sure the dependencies, NumPy and SciPy, are already installed before gensim.
  • TextBlob: This library (https://textblob.readthedocs.io/en/dev/) is a relatively new one built on top of NLTK. It simplifies NLP and text analysis with easy-to-use built-in functions and methods, as well as wrappers around common tasks. We can install TextBlob by running the following command line in the terminal:
pip install -U textblob

TextBlob has some useful features that are not available in NLTK (currently), such as spell checking and correction, language detection, and translation.

主站蜘蛛池模板: 怀柔区| 巴中市| 霍山县| 库车县| 兰西县| 澎湖县| 灵宝市| 同德县| 齐齐哈尔市| 剑川县| 藁城市| 泸溪县| 乡城县| 油尖旺区| 鱼台县| 新郑市| 方山县| 东至县| 廊坊市| 红河县| 彭山县| 红河县| 雷山县| 渝北区| 河池市| 栖霞市| 肇东市| 突泉县| 邛崃市| 河北区| 江津市| 恩平市| 怀柔区| 正阳县| 阿拉善左旗| 和田市| 金堂县| 永年县| 清远市| 漳浦县| 白河县|