This piece of analysis was an output of a broader project I did to analyse the Yelp Challenge Dataset to build a business case for a restaurant (i.e. where/when/what restaurant to open etc).

Our group had decided to open a Vietnamese restaurant and one of the questions we wanted to ask was what kind of restaurant to open. One of the things I was interested in, which I thought tied in well with my other analysis involving food text, was what foods have been trending up in Vietnamese restaurants over the years?

Data cleaning and text analytics

The code I used to extract food trends…

As high school students across NSW begin one of the most stressful period of their high school lives — the higher school certificate exams, what can we find by looking at historical Top Achiever results?


With so much education data made publicly available by the NSW Government, I asked some teachers (i.e. brother-in-law and sister) what insights they would be interested in extracting from the data.

One topic was understanding the trends in subjects, gender and schools of Top Achievers.

Each year, after the HSC, the NSW Education Standards Authority publishes a list of Top Achievers — these are ‘student(s)…

The linear programming technique (Data Envelopment Analysis) objectively determines the weights of evaluation criteria after suppliers are scored as opposed to the traditional process where the buyer pre-determines the evaluation criteria weightings before suppliers are scored


In this article, I illustrate, through example, a linear programming technique called Data Envelopment Analysis (“DEA”) to score and rank suppliers. This approach is premised on a mathematical model that objectively assigns evaluation criteria weights after suppliers have been scored on evaluation criteria.

It has been argued that DEA provides fairer and more robust supplier selection results because of its objectivity and its ability…

The problem and motivation

I am not a natural or creative cook so when I want to do some cooking I always need a recipe to follow. My typical process before going on any grocery trip:

  1. Look through internet to find interesting recipes
  2. Aggregate the required ingredients
  3. Manually group ingredients by food type to optimise shopping approach
My manually created list

This process becomes tedious with increasing numbers of recipes (and more ingredients). Personally, I can’t handle more than 3 recipes at a time (and admire the super dedicated people who can)!

This results in frequent shopping trips which is undesirable because of the extra time involved and…


When I learnt that I could download my connections data from LinkedIn (which was surprisingly easy) from the book, Mining the Social Web — my interest was piqued. I wanted to explore this data to understand the following:

  • How has my networking activity changed over my career?
  • What kind of people are my connections and how has this changed?


1. Downloading the data

In LinkedIn, you can navigate to ‘Settings & Privacy’ (under your profile) → ‘Get a copy of your data’ and tick the ‘Connections’ and export it as csv.

Where to download your data on LinkedIn

The downloaded file has the following information on your…

Katie Zhang

Using Python, R and Dataviz tools to do interesting stuff with data in my spare time

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store