Breakthroughs in Natural Language Applications with Deep Learning based Sentiment Analysis

7 min readJun 23, 2019

Authors: Tony Tong and Xin Heng

Language is arguably the most fundamental tool that defines humanity. One of the key insights Yuval Noah Harari wrote in his book Sapiens: A Brief History of Mankind [1]: (with the development of language,) we become able to think sharply about abstract matters, cooperate in ever larger numbers, and, perhaps most crucially, gossip. (think about it for a second)

No one would doubt the importance of communications in modern businesses, whether it is within a business organization, or from a business to its customers or vice versa. As a concrete case, many of Punchh’s (https://punchh.com/) restaurant business customers receive numerous customer reviews in their apps or from online forums. These reviews contain valuable feedback directly from customers commenting on many business aspects. While it would be great to have human professionals to read through every single review, it would hardly be scalable as the business grows and the number of reviews explodes. However, algorithmic analysis of these reviews to “understand” their accurate meanings has been elusive ever since the invention of modern computers. Most algorithms in service today still rely on marking a set of predefined keywords to make suggestions, therefore, leading to poor performance when these keywords appear in more complex contexts.

Deep-learning making strides in Natural Language Understanding

“Understanding” human language is hard, extremely hard. In fact, the original Turing test in his landmark 1950 paper, could be construed as a question/answering test conversed in human (natural) language. For decades, computational linguistics explored every path available to analyze natural language, such as part-of-speech (POS) tagging, dependency parsing, named-entity recognition (NER), knowledge graphs, logical deductions — just to name a few. But none of these techniques could achieve a generalized performance level that is comparable to that of human, not even close. Then, deep learning came to the limelight…

Deep learning grabbed the public’s attention since 2012 with a landmark breakthrough in almost halving the prediction error rate in image classification reported from none other than Professor Geoffrey Hinton’s group at the University of Toronto. However, major strides in performance breakthrough with deep learning on natural language understanding have only been made in the past couple of years.

An open benchmarking platform called Stanford Question Answering Dataset (SQuAD) may provide some helpful perspective. SQuAD was originally published in 2016 with the idea of testing machine reading comprehension by asking natural questions and expect natural answers from machines, much the same way as testing human students in GRE or GMAT tests. In late 2016, the machine models achieved an F1 score (a generalized performance metric) of high 60’s, while average humans achieved 91 (higher the better). In late 2018, Google’s AI team submitted their BERT (Bidirectional Encoder Representation using Transformers) model achieving an F1 score of 93, exceeding the average human performance! SQuAD later tweaked their test a little to make it a bit harder for machines by adding some trick questions that are unanswerable based on the information provided. The tweaked version is called SQuAD 2.0. With SQuAD 2.0, at the end of 2018, the best algorithms achieved F1 scores around 86, below the average human performance of 89.45. Six months later, today, the best algorithm has just beaten the human benchmark and achieved 89.47 F1 score. This is an astounding milestone!

A quick glance at the leaderboard, one can see that it is all deep learning based algorithms within the past year or so. In fact, the top 10 are all variants of Google’s open-sourced BERT model or similar Transformer structured models within the last few months! That’s quite remarkable in itself!

[SQuAD 2.0 Leaderboard https://rajpurkar.github.io/SQuAD-explorer/]

Deep sentiment

With technology going through step-function advancement, expect some fundamental shifts in business. Punchh is doing exactly that to facilitate the advancement in deep learning into daily business applications. Let’s take a look at a ‘deep sentiment’ product prototype we built here to help our customers analyze restaurant customers feedback reviews.

Let’s start with some examples:

“The taco of this restaurant is very tasty, but service can definitely improve.”

“Best burnt ends ever.”

“Customer service was lacking today, service was slow, employees looked angry, no smiles, food was poor, lack of quantity in burritos, lacking beans and rice, cadhier could figure out or ordrr after repeating it to her multiple of times. burnt my first burrito shell, had to throw it away due to service being slow and blah while not paying attention. ”

“First time trying the pulled pork sandwich and it did not disappoint!”

“We have an allergy order and they came out to talk to us to ensure they had everything safe and correct for us!”

“Cashier was rude over charged was talking to other co-workers would not pay attention to my order”

In this model, we have sentiment scores in five different categories: customer service, food quality, ambience, wait time, and app program. For each category, we have both a negative and positive sentiment reading ranging from 0 to 1. A reading close to 1 indicates a strong sentiment in such a category.

After glancing at a few examples above, we can make a few interesting observations:

It’s reasonably accurate, matching our human hunches. It can correctly identify sentiment in different categories.
It is context sensitive. In the ‘burnt ends’ example, it knows that it refers to a menu item; while in a later example where ‘burnt’ literally means ‘burnt’.
It is tolerant of typo’s or “non-standard” English.
It is possible to have both positive and negative sentiments in the same category.

Many of the above features benefited from the underlying deep learning model. That is not to say that you cannot code up a set of explicit rules to achieve these performance levels. However, to do so would indeed be a challenging task and difficult to adapt to a different set of situations (think about a different language).

Deep learning models are relatively straightforward — no specific semantic rules are required. In our case, the model first learned the “language” itself from reading Wikipedia (yes, all pages!), then we asked the model to learn our domain specific language (just to get used to the jargons, typo’s! and ways people express their feelings). After that, we attach a classification header and train it with some sentiment-labeled dataset. That’s it, no rules, everything is learned from the data through the process!

Simplicity has its merits. It provides a more general platform that can readily adapt to changing business needs or different natural language applications altogether.

Quantified sentiment indices also enable some useful higher-level abstractions, such as sentiment distributions and trend tracking.

For example, for a certain period/location, we can look at the sentiment indices distribution, and further interactively slice & dice the data down to a single customer/review.

We may also monitor the trend. The following trendline chart shows the positive-to-negative reviews ratios for each of the five categories in real-time for a fictitious business. In this particular case, we can see a seemingly downward sloping pos-to-neg ratio in ‘customer-service’. Decision makers may be incentivized to dig further to understand what has caused such a downward drift.

Conclusion

Deep learning is literally bringing about a revolution right now in natural language understanding and its related applications. This will produce a profound impact in our business world. What could be better than helping our customers turn this could-be technical challenge into a competitive edge? Punchh is committed to this mission.

About Punchh

Headquartered in San Mateo, CA, Punchh is the world leader in innovative digital marketing products for brick and mortar retailers, combining AI and machine learning technologies, mobile-first expertise, and Omni-Channel communications designed to dramatically increase lifetime customer value. Leading global chains in the restaurant, health and beauty sectors rely on Punchh to grow revenue by building customer relationships at every stage, from anonymous, to known, to brand loyalists, including more than 100 different chains representing more than $12 billion in annual spend.

About the author

Dr. Tony Tong is a Staff Machine Learning Scientist at Punchh’s Big Data and AI team. He is an enthusiastic machine learning and AI practitioner, who loves turn working models into full-fledged machine learning services for business applications.

Dr. Xin Heng is the Senior Director, Head of Data Science in Punchh, where his primary responsibility is to build world-class data solutions to drive the growth of our business partners. His team is working on AI & BI products and big data engineering tools.

References

[1] Yuval Noah Harari, Sapiens: A Brief History of Mankind.

[2] Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, ImageNet Classification with Deep Convolutional Neural Networks, presented at NIPS 2012.

[3] Stanford Question Answering Dataset (SQUAD) 2.0 Leaderboard, https://rajpurkar.github.io/SQuAD-explorer/.