Over the years Natural Language Processing in Python has evolved drastically. In this article I’m going to give you an overview of the technologies and libraries that exist and can make your life easier when trying to do textual analysis in your documents.
Wether you are trying to do sentiment analysis on a twitter account, or you are trying to extract names from a big blog of text for compliance I will have you covered in this article.
My goal is to provide you with the knowledge you need to implement this in your needs.
Python Natural Language Processing Libraries
Python has 2 main libraries that can assist you with your NLP needs. These are:
- spaCy
- NLTK
Both frameworks have a wide range of feature sets which we will talk about in the next section.
What you need to know is that they are both open source and they are both extendeble allowing you input your own datasets and machine learning models to fine tune them to do what you want to accomplish.
How To Use spaCy/NLTK In Python
In this section we will go over how to use Python spaCy and NLTK frameworks and go over some use-cases in more detail. The most common use case that people use is to mine big documents of text and from that to extract human names.
A good source that guides you step by step with examples for NLTK can be found in this article:
How To Extract Human Names Using Python NLTK
A similar article on doing this using spaCy can be found here:
How To Extract Human Names Using Python spaCy
Besides extracting names these two libraries also offer other capabilities such as doing sentiment analysis and trying to understand what the person’s mood is at the time of the writing.
Using the methods above you can basically see for example if a Celebrity is having a good or a bad day, more specifically though you can also monitor influencers that affect the stock market and see how those react to it.
Up to now this kind of process was being done manually by having big teams of people monitor the companies twitter or public news outlets and from that act upon it.
What if this process was automated?
Now the solution to this is using Python and more specifically natural language processing and the libraries above will help you into accomplishing this. spaCy excels on doing this automation and keeping everything into sync using pipelines as seen below.
With NLTK you need to take a step further and write a bit of code to accomplish the same thing but this does not discount anything from it. If you consider that NTLK offers on the other hand a lot of visualization and trees on your entities then it’s definitely worth keeping in your arsenal of tools when working with NLP.
As you can see those two frameworks can work synergistically to accomplish the results you need. I personally have used both for different purposes with great success.
Future Of Python Natural Language Processing
As we evolve so does machine learning technologies get better and NLP is no exception to this rule. In the past few years we have seen a rise on unsupervised learning and machine learning models being automated and streamlined from using outside sources.
So with natural language processing it was a natural precursor to do this. The most frequent use case we are seeing these days is getting as seed data search engine results and parsing them.
So for example you will crawl sources that come from google results and from there on you will essentially create what you need by setting up training functions. The idea of training functions are there to let you understand if a decision made by your trained model was good or bad and accordingly reject or accept the result. Using this process you vastly improve your models and increase the accuracy of your natural language processing entity recognition.
Before I close I want to show a current graph from google trends.
To get an idea of why this isn’t going anywhere and it’s here to stay. More specifically it shows that it wasn’t just some upward trend that happened 5 years ago (date range below) and it’s now over. So make your bets and invest on learning the technology if you want to learn. The resources I provided earlier in the links have a good guide with examples that can get you started very fast.