yuanxu-li / html-table-extractor

Licence: MIT license

extract data from html table

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to html-table-extractor

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (+527.03%)

Mutual labels: scraping, beautifulsoup

Languagepod101 Scraper

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Stars: ✭ 104 (+40.54%)

Mutual labels: scraping, beautifulsoup

TorScrapper

A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)

Stars: ✭ 24 (-67.57%)

Mutual labels: scraping, beautifulsoup

Scraper-Projects

🕸 List of mini projects that involve web scraping 🕸

Stars: ✭ 25 (-66.22%)

Mutual labels: scraping, beautifulsoup

chopper

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules

Stars: ✭ 22 (-70.27%)

Mutual labels: scraping, beautifulsoup

Souqscraper

Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube

Stars: ✭ 118 (+59.46%)

Mutual labels: scraping, beautifulsoup

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+687.84%)

Mutual labels: scraping, beautifulsoup

html-table-to-json

Generate JSON representations of HTML tables

Stars: ✭ 39 (-47.3%)

Mutual labels: scraping, html-table

Euro2016 TerminalApp

⚽ Instantly find 🏆EURO 2016 live-streams & highlights, now a Web App!

Stars: ✭ 54 (-27.03%)

Mutual labels: scraping, beautifulsoup

Requests Html

Pythonic HTML Parsing for Humans™

Stars: ✭ 12,268 (+16478.38%)

Mutual labels: scraping, beautifulsoup

linkedin-scraper

Tool to scrape linkedin

Stars: ✭ 74 (+0%)

Mutual labels: scraping, beautifulsoup

react-native-simple-table

A simple table for react native.

Stars: ✭ 32 (-56.76%)

Mutual labels: table, html-table

Tieba-Birthday-Spider

百度贴吧生日爬虫，可抓取贴吧内吧友生日，并且在对应日期自动发送祝福

Stars: ✭ 28 (-62.16%)

Mutual labels: beautifulsoup

linkedinBot

Automate the process of sending referral request and cold mailing on LinkedIn

Stars: ✭ 25 (-66.22%)

Mutual labels: beautifulsoup

jQuery-Freeze-Table-Column-and-Rows

This is a jQuery plugin that can make table rows and columns not scroll. It can take a given HTML table object and set it so it can freeze a given number of columns or rows or both, so the fixed columns or rows do not scroll. The rows to be frozen should be placed in the table head section. It can also freeze rows and columns combined with using…

Stars: ✭ 20 (-72.97%)

Mutual labels: html-table

docker-selenium-lambda

The simplest demo of chrome automation by python and selenium in AWS Lambda

Stars: ✭ 172 (+132.43%)

Mutual labels: scraping

vue-table-for

Easily build a table for your records

Stars: ✭ 33 (-55.41%)

Mutual labels: table

obj-to-table

Create a table from an array of objects

Stars: ✭ 15 (-79.73%)

Mutual labels: table

covid19br-pub

Projeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.

Stars: ✭ 12 (-83.78%)

Mutual labels: scraping

non-api-fb-scraper

Scrape public FaceBook posts from any group or user into a .csv file without needing to register for any API access

Stars: ✭ 40 (-45.95%)

Mutual labels: beautifulsoup

View All Similar Projects ➔

HTML Table Extractor

HTML Table Extractor is a python library that uses Beautiful Soup to extract data from complicated and messy html table

Important links

Repository: https://github.com/yuanxu-li/html-table-extractor
Issues: https://github.com/yuanxu-li/html-table-extractor/issues

Installation

pip install 'beautifulsoup4==4.5.3'
pip install html-table-extractor

Usage

Example 1 - Simple

1	2
3	4

from html_table_extractor.extractor import Extractor
table_doc = """
<table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
"""
extractor = Extractor(table_doc)
extractor.parse()
extractor.return_list()

It will print out:

[[u'1', u'2'], [u'3', u'4']]

Example 2 - Transformer

1	2
3	4

from html_table_extractor.extractor import Extractor
table_doc = """
<table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
"""
extractor = Extractor(table_doc, transformer=int)
extractor.parse()
extractor.return_list()

It will print out:

[[1, 2], [3, 4]]

Example 3 - Pass BS4 Tag

1	2
3	4

from html_table_extractor.extractor import Extractor
from bs4 import BeautifulSoup
table_doc = """
<html><table id='wanted'><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table><table id='unwanted'><tr><td>not wanted</td></tr></table></html>
"""
soup = BeautifulSoup(table_doc, 'html.parser')
extractor = Extractor(soup, id_='wanted')
extractor.parse()
extractor.return_list()

It will print out:

[[u'1', u'2'], [u'3', u'4']]

Example 4 - Complex

1	2	3
	4
5

from html_table_extractor.extractor import Extractor
table_doc = """
<table>
  <tr>
    <td rowspan=2>1</td>
    <td>2</td>
    <td>3</td>
  </tr>
  <tr>
    <td colspan=2>4</td>
  </tr>
  <tr>
    <td colspan=3>5</td>
  </tr>
</table>
"""
extractor = Extractor(table_doc)
extractor.parse()
extractor.return_list()

It will print out:

[[u'1', u'2', u'3'], [u'1', u'4', u'4'], [u'5', u'5', u'5']]

Example 5 - Conflicted

1	2	3
	4
5

from html_table_extractor.extractor import Extractor
table_doc = """
<table>
    <tr>
        <td rowspan=2>1</td>
        <td>2</td>
        <td rowspan=3>3</td>
    </tr>
    <tr>
        <td colspan=2>4</td>
    </tr>
    <tr>
        <td colspan=2>5</td>
    </tr>
</table>
"""
extractor = Extractor(table_doc)
extractor.parse()
extractor.return_list()

It will print out:

[[u'1', u'2', u'3'], [u'1', u'4', u'3'], [u'5', u'5', u'3']]

Example 6 - Write to file

1	2
3	4

from html_table_extractor.extractor import Extractor
table_doc = """
<table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
"""
extractor = Extractor(table_doc).parse()
extractor.write_to_csv(path='.')

It will write to a given path and create a new csv file called output.csv:

1,2
3,4

Team

@yuanxu-li

Errors/ Bugs

If something is not working correctly, or if you have any suggestion on improvements, report it here

Copyright

Third-party copyright in this distribution is noted where applicable.

Misc

How to upload the package to pypi (for the reference of the owner)

python setup.py bdist_wheel --universal
twine upload dist/* --verbose

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yuanxu-li / html-table-extractor

Programming Languages

Labels

Projects that are alternatives of or similar to html-table-extractor

HTML Table Extractor

Important links

Installation

Usage

Example 1 - Simple

Example 2 - Transformer

Example 3 - Pass BS4 Tag

Example 4 - Complex

Example 5 - Conflicted

Example 6 - Write to file

Team

Errors/ Bugs

Copyright

Misc