Trending menu items in Vietnamese restaurants — Applying text analytics to Yelp reviews (Python)

Katie Zhang
4 min readJan 7, 2021

Introduction

This piece of analysis was an output of a broader project I did to analyse the Yelp Challenge Dataset to build a business case for a restaurant (i.e. where/when/what restaurant to open etc).

Our group had decided to open a Vietnamese restaurant and one of the questions we wanted to ask was what kind of restaurant to open. One of the things I was interested in, which I thought tied in well with my other analysis involving food text, was what foods have been trending up in Vietnamese restaurants over the years?

Data cleaning and text analytics

The code I used to extract food trends can be found the jupyter notebook here. Below is an overview of the approach.

1. Create database of food words

To extract mentions of food, I wanted to create a list of words to be recognised as food.

I downloaded a list of common food names from USDA here. However, some food names such as ‘pho’ would not exist in this database and so I added Vietnamese specific food names from a Wikipedia page here to this list. The code for extracting of data from these sources is included in my notebook.

2. Add trends and attribute to the reviews dataset

The reviews can be downloaded from the Yelp website (called reviews.json). To filter out only the Vietnamese restaurants, you have to join this dataset to the business.json dataset by business ID which has the ‘categories’ (the cuisines) information for each business. The data cleaning code, including the code to convert the json into csv, is not included in my notebook.

3. Extract food mentions from the reviews

The code goes through each review and extracts any mentions of menu item names. As food names are typically more than one word, it uses TextBlob to apply Part-Of-Speech (POS) tagging (tags words as nouns, adjectives, verbs etc) to the reviews to get more useful (and relevant) dish names.

The code firstly goes through the review text to identify any mention of ‘food’ words from the list created in Step 1, then it extracts the two word description of the food/menu item based on POS tagging rules:

a) [food word] + NOUN e.g. ‘lobster pho’

b) NOUN/ADJECTIVE + [food word] e.g. ‘fried rice’

c) if the two words don’t fit into the (a) or (b) category, then only the single food word is extracted

For each food mention, I extracted the year they were mentioned as well as whether it came from a good (3+ stars) or bad (< 3 stars) review.

The final dataset (food mention, year, review type) created is ready for visualisation.

Visualisations and Insights

Word Cloud outputs

Below is a side by side comparison of food mentions in good reviews vs bad reviews in word cloud form.

There isn’t much difference between the two — this isn’t surprising as you would expect the types of food mentioned in good or bad reviews to be the same.

The most popular mentions are things you would expect in any Vietnamese restaurant like spring rolls (also called egg rolls in America —as an Aussie, I did not know this…), pho, banh mi and fish sauce. There are also some foods that surprised me like bubble tea and Thai tea which are not Vietnamese (but maybe finding this items in a Vietnamese restaurant is not so surprising to Americans?).

Time trend

What was more interesting to me was adding time trend information to see trends of food/menu items over time. The below visual shows how ranking of top food mentions have changed over the years in Yelp reviews.

Top food mentions ranking over time

Some foods (very important for Vietnamese restaurants to get right/to offer!) remain consistently at the top mentions over time (spring rolls, pho, banh mi, bun bo hue).

Some foods also appeared to be getting more popular in mentions over time representing a shift in the taste/expectations of Americans towards their Vietnamese restaurants (and I can only hazard a guess as to why):

  • chicken pho climbed from rank 12 to rank 7 over the years
  • bubble tea/milk tea increased rank from 2014 to 2017 and remained consistently at rank 11 — this probably coincides with the explosion of popularity for bubble tea at this time and Vietnamese restaurants making sure they jump on the bandwagon
  • Vietnamese coffee increased in rank over the years

Application and further work

I think identifying food trends provides:

  • an interesting insight into the shifting culture/fads/taste of the consumer and
  • is potentially informative for menu design (disclaimer, I have no experience in the restaurant industry)

The main areas of improvement are:

  • capturing food/drink item names better — for example not very useful captures (‘decent pho’) and incomplete dish names such as (‘grilled pork’)
  • automating the consolidation of similar food/drink item names for visualisation — for example ‘bubble tea’ and ‘milk tea’

--

--

Katie Zhang

Using Python, R and Dataviz tools to do interesting stuff with data in my spare time