What I learnt from mining my own LinkedIn connection data

Katie Zhang
5 min readAug 5, 2020

Introduction

When I learnt that I could download my connections data from LinkedIn (which was surprisingly easy) from the book, Mining the Social Web — my interest was piqued. I wanted to explore this data to understand the following:

  • How has my networking activity changed over my career?
  • What kind of people are my connections and how has this changed?

Method

1. Downloading the data

In LinkedIn, you can navigate to ‘Settings & Privacy’ (under your profile) → ‘Get a copy of your data’ and tick the ‘Connections’ and export it as csv.

Where to download your data on LinkedIn

The downloaded file has the following information on your connections: ‘First Name’, ‘Last Name’, ‘Company’, ‘Position’ and ‘Connected On’.

2. Add additional features (location of contacts and contact clusters)

I adapted and added to code described in Chapter 4 of Mining the Social Web ,which is also available on their GitHub page, to create additional features.

Location of the contact

I used a Google Maps geocoding API to obtain location coordinates based on my connection’s company name — this method certainly had problems when it came to company addresses that weren’t listed in Google Maps and companies which had multiple offices. This code is available on the GitHub page mentioned above.

Contact clusters based on position title

To measure similarity, I first weighted the terms using TD-IDF and then used cosine similarity as the distance threshold using code I learnt from my course. This was the measure I incorporated into the GitHub code (they originally used Jaccard similarity) mentioned above.

This was because I wanted to up-weight titles that were relatively rare to make the clusters more meaningful. For example, in the position titles of (A) Procurement Manager, (B) Procurement Analyst, (C) Senior Manager — I would expect the algorithm to assign a higher similarity score to (A) and (B) relative to (A) and (C) because the word ‘procurement’ is less generic (compared to ‘manager’), and would therefore be up-weighted.

The clusters were then named after the common terms in the position titles of connections within that cluster.

3. Overlay with my own career history and visualise

I manually created a rough representation my own career history (a table consisting of ‘Year’, ‘Function/Industry’, ‘Location’) and joined the ‘Year’ field to the ‘Connected On’ field of my connections data in Tableau.

I then created a simple dashboard that allowed me to explore the interactions between the:

  • Location of my connections
  • Year of connection
  • Cluster which my connection belonged to
  • Function (or career phase) I was in when I connected with them

Below is a screenshot of the simple dashboard I created in Tableau.

Dashboard of my LinkedIn connections

Observations and Reflections

Networking activity

The peak and decline of my network connections coincided with my switch from a procurement professional (peak) to a consultant (decline).

The large amount of connections in 2016 probably reflects all the networking events I attended then. As an in-house procurement professional, I also received a lot of requests to connect from suppliers. Of course, another reason may be that I went on an adding spree of my high school / university colleagues at this time when I started to take LinkedIn seriously.

I was a bit surprised with my lack of proactive networking during my consultant years — I really should have been more diligent in connecting and keeping in touch with the diverse clients I met during this time!

Position title clusters

During my time in procurement — my top clusters surprisingly had no ‘procurement’ in their titles and appeared to be more senior (with common titles such as ‘Director’, ‘Senior’) so it didn’t seem that I was actively networking with my functional peers at this stage.

On the other hand, when I became a consultant, my top cluster were those with ‘procurement’ in their position titles — this probably reflected the business development consultant mindset honed into to me by management i.e. to treat anyone in procurement as a potential client and to connect with them.

Top: Connection clusters — procurement career; Bottom: Connection clusters - consulting career

Location of network

Not reading too much into this data as although I do have some connections in USA there are too many contacts displayed with a USA location to make sense! This is probably due to the geo-coding API which tended to give the US address if the my connection’s company was a multi-national one (even though they are based in Australia).

So now what?

This analysis, for me, has been more useful as a reflective activity on how I have made connections and approached networking throughout my career.

Of course, the analysis has also highlighted that I could have networked better in my career (both in terms of peers in the same function when I was in procurement as well as with my clients when I was a consultant) but obviously, I wouldn’t have needed to do this kind of data analysis to come up with this insight :)

Regardless, I still found some unexpected insights from this data exploration exercise (my dashboard also got my husband curious and he has requested one for himself).

Side note: Improvements

The key improvements would be to find a better way to geo-locate connections as well as extracting their industry information. LinkedIn obviously has this information but after much Googling, I have not been able to found an easy way to extract this from LinkedIn itself.

There also does not seem to be easy way to mine company industry information based on their name (with open source data).

If anyone has an idea on how to make these improvements — I would love to know!

--

--

Katie Zhang

Using Python, R and Dataviz tools to do interesting stuff with data in my spare time