All Projects → schollz → extract_recipe

schollz / extract_recipe

Licence: Apache-2.0 license
Extract recipes from websites, calculates cooking times, collects nutrition info from USDA database

Programming Languages

python
139335 projects - #7 most used programming language
go
31211 projects - #10 most used programming language

The internet is full of recipes. A recipe is really only the directions and the ingredients, but for any given recipe website you always get a lot more information thatn you want (comments, links, advertisements). For instance the page for Alton Brown's waffle recipe looks like this:

Website example

How can you just pull out the directions and ingredients, in a tag-independent way that can be used across any website?

Recipe extractor

How it works

First it grabs the Markdown-formatted text of the html page using html2text. It evalutes the number of 'cooking' related words in each line and pulls out the peak.

Website example

It does the same for extracting the ingredient list. Then it scans the directions and the ingredient list to determine what the actual baking time would be and gives that estimate. Finally, it parses the measurements and foods in the ingredient list and cross-references the USDA SR27 food databse to get accurate estimations of the nutrition content.

To Do

  • Extract recipes.
  • Extract ingredients.
  • Cross references ingredients and measurements to USDA database.
  • Make AJAX interface.
  • Better identifying common foods in USDA database
  • Simpler conversion between food dimensions/weights
  • Add in pricing
  • Save data as JSON
  • Clean up code by splitting into files
  • Better error handling
  • Add nice CSS into web interface.
  • Add in photos of foods

Installation

Requirements

sudo apt-get install python-nltk
sudo pip install pint

Then goto python console

>>> import nltk
>>> nltk.download('brown')
>>> nltk.download('maxent_treebank_pos_tagger')
>>> nltk.download('wordnet')

Clone the SR27 database downloader and run the makefile to generate the database. Copy the database into the same folder. Then go into the database and make sqlite3 fts4 virtual table

drop table data;
drop table nutrition_date;
drop table nutrition_def;

create virtual table data using fts4(ndb_no,shrt_desc,long_desc,com_desc);
insert into data(ndb_no,shrt_desc,long_desc,com_desc) select ndb_no,shrt_desc,long_desc,com_desc from food_des;

create virtual table nutrition_data using fts4(ndb_no,nutr_no,nutr_val);
insert into nutrition_data(ndb_no,nutr_no,nutr_val) select ndb_no,nutr_no,nutr_val from nut_data;

create virtual table nutrition_def using fts4(nutr_no,units,tagname);
insert into nutrition_def(nutr_no,units,tagname) select nutr_no,units,tagname from nutr_def;

Example Output

Visit the current incantation at http://ips.colab.duke.edu:8081/extractor.html and run which shows the extraction of the following recipe: http://www.foodnetwork.com/recipes/alton-brown/baked-macaroni-and-cheese-recipe.html.


Ingredients

  • 1/2 pound elbow macaroni
  • 2 1/8 cup elbow shaped Macaroni, dry, enriched (226.8 grams) - $3.76
  • 3 tablespoons butter
  • 3.0 tbsp Butter, salted (42.6 grams) - $1.08
  • 3 tablespoons flour
  • 1/8 cup Wheat flour, white, all-purpose, enriched, unbleached (23.4 grams) - $0.12
  • 1 tablespoon powdered mustard
  • 3 1/8 tsp or 1 packet Mustard, prepared, yellow (15.6 grams) - $0.41
  • 3 cups milk
  • 3/4 quart Milk, whole, 3.25% milkfat, with added vitamin D (732.0 grams) - $0.58
  • 1/2 cup yellow onion, finely diced
  • 1.0 cup, sliced Onions, raw (118.3 grams) - $3.93
  • 1 bay leaf
  • 1.0 tsp, crumbled Spices, bay leaf (0.6 grams) - $0.09
  • 1/2 teaspoon paprika
  • 1/3 tsp Spices, paprika (1.1 grams) - $0.07
  • 1 large egg
  • 1.0 small Egg, whole, raw, fresh (38.0 grams) - $0.19
  • 12 ounces sharp cheddar, shredded
  • 12 1/8 slice 1 oz. slice CHEESE,CHEDDAR,SHARP,SLICED (340.2 grams) - $10.3
  • 1 teaspoon kosher salt
  • 1.0 tsp Salt, table (6.0 grams) - $0.19
  • Fresh black pepper
  • 1.0 dash Spices, pepper, black (0.1 grams) - $0.0
  • Topping:
  • 1.0 serving 5 servings per 22.85 oz package PIZZA,MEAT & VEG TOPPING,REG CRUST,FRZ,CKD (129.0 grams) - $0.0
  • 3 tablespoons butter
  • 3.0 tbsp Butter, salted (42.6 grams) - $1.08
  • 1 cup panko bread crumbs
  • 5 1/4 cup, crumbs Bread, white, commercially prepared (includes soft bread crumbs) (236.6 grams) - $2.83

Directions

In a large pot of boiling, salted water cook the pasta to al dente. While the pasta is cooking, in a separate pot, melt the butter. Whisk in the flour and mustard and keep it moving for about five minutes. Make sure it's free of lumps. Stir in the milk, onion, bay leaf, and paprika. Simmer for ten minutes and remove the bay leaf. Temper in the egg. Stir in 3/4 of the cheese. Season with salt and pepper. Fold the macaroni into the mix and pour into a 2-quart casserole dish. Top with remaining cheese. Melt the butter in a saute pan and toss the bread crumbs to coat. Top the macaroni with the bread crumbs. Bake for 30 minutes. Remove from oven and rest for five minutes before serving. Remember to save leftovers for fried Macaroni and Cheese. Recipe courtesy Alton Brown

+1 minute for tossing.

Calculated time: 51 minute

Calculated cost: $24.64

Serving size is about 13.0

Nutrition data (ALL)

Main

  • Energy: 8542.51051625 Calories
  • Water: 539.29 grams
  • Carbohydrate, by difference: 441.39 grams
  • Total lipid (fat): 257.0 grams
  • Ash: 134.75 grams
  • Protein: 122.14 grams
  • Fiber, total dietary: 103.0 grams
  • Sugars, total: 35.41 grams

Sugars

  • Starch: 120.02 grams
  • Fructose: 12.32 grams
  • Glucose (dextrose): 9.88 grams
  • Lactose: 5.23 grams
  • Maltose: 4.5 grams
  • Sucrose: 2.51 grams
  • Galactose: 0.56 grams

Other

Metals

  • Sodium, Na: 43.145 grams
  • Potassium, K: 5.503 grams
  • Calcium, Ca: 2.876 grams
  • Phosphorus, P: 2.088 grams
  • Magnesium, Mg: 0.704 grams
  • Iron, Fe: 0.09089 grams
  • Manganese, Mn: 0.025675 grams
  • Zinc, Zn: 0.02027 grams
  • Copper, Cu: 0.003438 grams
  • Selenium, Se: 0.0002409 grams
  • Fluoride, F: 0.0001124 grams

Vitamins

  • Vitamin A, IU: 63133.0 IU
  • Vitamin D: 294.0 grams
  • Choline, total: 0.5077 grams
  • Betaine: 0.2873 grams
  • Vitamin C, total ascorbic acid: 0.0585 grams
  • Vitamin E (alpha-tocopherol): 0.03875 grams
  • Niacin: 0.034416 grams
  • Carotene, beta: 0.027021 grams
  • Lutein + zeaxanthin: 0.020214 grams
  • Tocopherol, gamma: 0.01494 grams
  • Pantothenic acid: 0.00863 grams
  • Cryptoxanthin, beta: 0.006247 grams
  • Tocotrienol, alpha: 0.00524 grams
  • Vitamin B-6: 0.005071 grams
  • Vitamin A, RAE: 0.004706 grams
  • Riboflavin: 0.004426 grams
  • Thiamin: 0.003218 grams
  • Retinol: 0.001861 grams
  • Lycopene: 0.001835 grams
  • Folate, DFE: 0.001267 grams
  • Folate, total: 0.000922 grams
  • Carotene, alpha: 0.000608 grams
  • Tocopherol, beta: 0.00056 grams
  • Folic acid: 0.000493 grams
  • Folate, food: 0.000429 grams
  • Tocopherol, delta: 0.00039 grams
  • Vitamin K (phylloquinone): 0.0002716 grams
  • Tocotrienol, gamma: 0.0001 grams
  • Tocotrienol, beta: 1e-05 grams
  • Menaquinone-4: 9.6e-06 grams
  • Vitamin D3 (cholecalciferol): 7.3e-06 grams
  • Vitamin D (D2 + D3): 7.3e-06 grams
  • Vitamin B-12: 3.18e-06 grams
  • Dihydrophylloquinone: 1e-07 grams

Amino acids

  • Glutamic acid: 21.142 grams
  • Proline: 10.524 grams
  • Aspartic acid: 9.503 grams
  • Leucine: 7.934 grams
  • Valine: 5.319 grams
  • Lysine: 5.11 grams
  • Serine: 5.024 grams
  • Phenylalanine: 4.614 grams
  • Arginine: 4.501 grams
  • Isoleucine: 4.189 grams
  • Alanine: 3.949 grams
  • Tyrosine: 3.623 grams
  • Threonine: 3.522 grams
  • Glycine: 3.3 grams
  • Histidine: 2.23 grams
  • Methionine: 1.796 grams
  • Cystine: 1.369 grams
  • Tryptophan: 1.015 grams

Steroids

  • Cholesterol: 0.927 grams
  • Phytosterols: 0.282 grams
  • Beta-sitosterol: 0.008 grams

Fatty Acids

  • Fatty acids, total saturated: 139.376 grams
  • Fatty acids, total monounsaturated: 67.956 grams
  • 18:1 undifferentiated: 62.581 grams
  • 16:0: 62.062 grams
  • 18:1 c: 47.753 grams
  • 18:0: 27.299 grams
  • Fatty acids, total polyunsaturated: 26.627 grams
  • 18:2 undifferentiated: 22.581 grams
  • 14:0: 19.176 grams
  • Fatty acids, total trans: 7.809 grams
  • 4:0: 7.403 grams
  • 12:0: 7.045 grams
  • 18:2 n-6 c,c: 6.996 grams
  • Fatty acids, total trans-monoenoic: 6.962 grams
  • 18:1 t: 6.855 grams
  • 10:0: 6.164 grams
  • 6:0: 4.703 grams
  • 18:3 undifferentiated: 3.34 grams
  • 16:1 undifferentiated: 3.2 grams
  • 8:0: 2.989 grams
  • 16:1 c: 2.534 grams
  • 18:3 n-3 c,c,c (ALA): 1.951 grams
  • 17:0: 1.343 grams
  • 22:1 undifferentiated: 1.059 grams
  • 22:1 c: 1.057 grams
  • Fatty acids, total trans-polyenoic: 0.847 grams
  • 18:2 CLAs: 0.721 grams
  • 20:1: 0.666 grams
  • 18:2 i: 0.592 grams
  • 20:0: 0.403 grams
  • 15:0: 0.335 grams
  • 14:1: 0.328 grams
  • 20:4 undifferentiated: 0.269 grams
  • 18:2 t not further defined: 0.252 grams
  • 20:3 undifferentiated: 0.23 grams
  • 16:1 t: 0.107 grams
  • 22:0: 0.089 grams
  • 17:1: 0.073 grams
  • 22:6 n-3 (DHA): 0.059 grams
  • 20:3 n-6: 0.058 grams
  • 24:1 c: 0.051 grams
  • 18:4: 0.038 grams
  • 20:2 n-6 c,c: 0.037 grams
  • 22:4: 0.026 grams
  • 22:5 n-3 (DPA): 0.024 grams
  • 24:0: 0.022 grams
  • 20:3 n-3: 0.021 grams
  • 18:3 n-6 c,c,c: 0.013 grams
  • 20:5 n-3 (EPA): 0.011 grams
  • 18:3i: 0.003 grams
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].