How I automated the creation of my grocery list from a bunch of recipe websites with Python

The problem and motivation

I am not a natural or creative cook so when I want to do some cooking I always need a recipe to follow. My typical process before going on any grocery trip:

  1. Aggregate the required ingredients
  2. Manually group ingredients by food type to optimise shopping approach
My manually created list
  • Produce as Output: list of ingredients, grouped by type, required quantity and associated recipe

Approach

The problem consists of two components:

  1. Food Group Extraction: From the ingredients list, find and extract the associated food groups
SELECT group_name, COUNT(*) as count
FROM FD_DES
LEFT JOIN FD_GROUP ON FD_DES.group_id = FD_GROUP.group_id
WHERE shortdes LIKE ?
AND FD_GROUP.group_id NOT IN ('0300', '0800',
'1800', '1900',
'2100', '2200',
'2500', '3500','3600')
GROUP BY group_name
ORDER BY count DESC
LIMIT 1
  • exclude any food groups with ingredients I almost never use (e.g. ‘0300 — Baby Foods’, ‘0800 — Breakfast Cereals’)
  • output the food group which the ingredient appears most
  • for two-word ingredients, first query both words together and if nothing results, then query the second word by itself — this is based on the assumption that the second word is usually more descriptive of a food category (e.g. ‘blue cheese’, but clearly this logic would not work for something like ‘egg whites’)

Using the code

To use the code, I just need to create a text file with a list of recipe URLs (an example below) in the same directory as the code.

My .txt input list (I have been quite obsessed with thewoksoflife.com lately )
C:\Users\plcpi\Google Drive\Personal Projects\Ingredients extractor> python grocery.py 'recipes0908.txt'
Results of running the code

Performance Assessment

I think the code gets 80–90% of the job done. The initial limitations I found are described below.

Final Thoughts

In the above example, I put in 5 recipe URLs which resulted in a csv file of 67 lines. It then only took me a few minutes to delete the ingredients already in my pantry and re-categorise the couple of ingredients that were in wrong groups.

Side note: Improvements

  • An easy extension is to consolidate ingredients, but personally I like to keep this separate in case I don’t want to shop for a certain recipe while I am out (which sometimes happens when I can’t find a certain ingredient which occurs more often now with COVID-19 shortages).
  • The code, which requires a Python environment, is not great for sharing or implementing without a desktop — making it a web app (and allowing the user ability to customise for which food groups to exclude and what units of measurements to output) would be a great next step.

Using Python, R and Dataviz tools to do interesting stuff with data in my spare time

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store