All Projects → amitupreti → Email-Crawler-Lead-Generator

amitupreti / Email-Crawler-Lead-Generator

Licence: MIT license
This email crawler will visit all pages of a provided website and parse and save emails found to a csv file.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Email-Crawler-Lead-Generator

requestsR
R interface to Python requests module
Stars: ✭ 12 (-74.47%)
Mutual labels:  requests, webscraping
image-crawler
An image scraper that scraps images from unsplash.com
Stars: ✭ 12 (-74.47%)
Mutual labels:  requests, webscraping
Proxy requests
a class that uses scraped proxies to make http GET/POST requests (Python requests)
Stars: ✭ 357 (+659.57%)
Mutual labels:  requests, webscraping
resto
🔗 a CLI app can send pretty HTTP & API requests with TUI
Stars: ✭ 113 (+140.43%)
Mutual labels:  requests
robotstxt
robots.txt file parsing and checking for R
Stars: ✭ 65 (+38.3%)
Mutual labels:  webscraping
youtube-audio
extract videos from youtube in audio format using webscraping techniques 🎶
Stars: ✭ 68 (+44.68%)
Mutual labels:  webscraping
PacPaw
Pawn package manager for SA-MP
Stars: ✭ 14 (-70.21%)
Mutual labels:  webscraping
get LibSeat
利昂图书馆预约系统自动预约&签到程序。支持包括中国人民大学、北京师范大学、济南大学、哈尔滨工业大学等在内的38所高校的图书馆系统
Stars: ✭ 39 (-17.02%)
Mutual labels:  requests
Geolocator-2
Learn how to find and work with locations in Django, the Yelp API, and Google Maps api.
Stars: ✭ 24 (-48.94%)
Mutual labels:  requests
CourseDownloader
GUI app for downloading whole online courses with folder structure from one url
Stars: ✭ 20 (-57.45%)
Mutual labels:  webscraping
gists
Methods for working with the GitHub Gist API. Node.js/JavaScript
Stars: ✭ 96 (+104.26%)
Mutual labels:  requests
browser-automation-api
Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.
Stars: ✭ 24 (-48.94%)
Mutual labels:  webscraping
gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
Stars: ✭ 97 (+106.38%)
Mutual labels:  webscraping
option chain analysis
NSE Nifty Option chain analysis on the web page.
Stars: ✭ 63 (+34.04%)
Mutual labels:  requests
crawler
requests+lxml爬虫,简单爬虫架构
Stars: ✭ 72 (+53.19%)
Mutual labels:  requests
usim800
usim800 is a Python driver module for SIM800 GSM/GPRS .
Stars: ✭ 36 (-23.4%)
Mutual labels:  requests
python3-concurrency
Python3爬虫系列的理论验证,首先研究I/O模型,分别用Python实现了blocking I/O、nonblocking I/O、I/O multiplexing各模型下的TCP服务端和客户端。然后,研究同步I/O操作(依序下载、多进程并发、多线程并发)和异步I/O(asyncio)之间的效率差别
Stars: ✭ 49 (+4.26%)
Mutual labels:  requests
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-70.21%)
Mutual labels:  requests
cpr
C++ Requests: Curl for People, a spiritual port of Python Requests.
Stars: ✭ 5,005 (+10548.94%)
Mutual labels:  requests
python-crawler
爬虫学习仓库,适合零基础的人学习,对新手比较友好
Stars: ✭ 37 (-21.28%)
Mutual labels:  requests

Email Crawler and Lead generator in python

This crawler takes an webaddress as input and then extracts all emails from that website by sequentially visiting every url in that domain.


· Email-Crawler-Lead-Generatorrt Bug · Request Feature

Old version without duplicate handling, no multithreading and memory management.

Table of Contents

About The Project

Crawler Demo

The Email Crawler makes sure that it only visits the urls in same domain and doesnot save duplicate emails.It also keeps the log of urls visited and dumps them at the end of crawling

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Clone the Email-Crawler-Lead-Generator
git clone https://github.com/nOOBIE-nOOBIE/Email-Crawler-Lead-Generator.git
  1. Install dependencies
pip install -r requirements.txt

If you have python2 and python3 both installed. You might need to do.

pip3 install -r requirements.txt

Usage

Simply pass the url as an argument

python email_crawler.py https://medium.com/

If you have python2 and python3 both installed. You might need to do.

python3 email_crawler.py https://medium.com/

Output

➜  email_crawler python3 email_crawler.py https://medium.com/
WELCOME TO EMAIL CRAWLER
CRAWL : https://medium.com/
1 Email found [email protected]
2 Email found [email protected]
CRAWL : https://medium.com/creators
3 Email found [email protected]
4 Email found [email protected]
5 Email found [email protected]
6 Email found [email protected]
7 Email found [email protected]
CRAWL : https://medium.com/@mshannabrooks
CRAWL : https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40mshannabrooks&source=listing-----5f0204823a1e---------------------bookmark_sidebar-
CRAWL : https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2F%40mshannabrooks&source=-----e5d9a7ef4033----6------------------

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Amit Upreti - @amitupreti

Project Link: https://github.com/nOOBIE-nOOBIE/Email-Crawler-Lead-Generator

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].