Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yogesh-desai → WebCrawlerTokopedia

yogesh-desai / WebCrawlerTokopedia

Licence: other

It is a web crawler and scrapper for https://www.Tokopedia.com. The project scrape the product-ID, product URL and product videos present under the product images present at right bottom of the page.

Programming Languages

31211 projects - #10 most used programming language

Labels

hacktoberfest hacktoberfest2020

Projects that are alternatives of or similar to WebCrawlerTokopedia

Social Ideation application to manage Ideas. Developed with Vue, Firebase & Vuetify

Stars: ✭ 15 (-6.25%)

Mutual labels: hacktoberfest2020

Android-Development

GameofSource

Stars: ✭ 14 (-12.5%)

Mutual labels: hacktoberfest2020

Bringing the Picture-in-Picture experience to the desktop.

Stars: ✭ 109 (+581.25%)

Mutual labels: hacktoberfest2020

opendevufcg.org

Portal da OpenDevUFCG

Stars: ✭ 52 (+225%)

Mutual labels: hacktoberfest2020

codingblocks.com

The Coding Blocks main website

Stars: ✭ 53 (+231.25%)

Mutual labels: hacktoberfest2020

YoPlaDo-Youtube-Playlist-Downloader

A simple python program to download Youtube Playlist at once.

Stars: ✭ 16 (+0%)

Mutual labels: hacktoberfest2020

Roadmap para se tornar um cientista da computação na UFCG

Stars: ✭ 49 (+206.25%)

Mutual labels: hacktoberfest2020

svelte-interview-questions

Concepts and Questions related to Svelte - Part of official Svelte resources list

Stars: ✭ 18 (+12.5%)

Mutual labels: hacktoberfest2020

ListBot is a Discord Bot which let's you create community lists on each channel.

Stars: ✭ 22 (+37.5%)

Mutual labels: hacktoberfest2020

Microsoft-Udacity-ML-scholarship

Just give your best shot!

Stars: ✭ 64 (+300%)

Mutual labels: hacktoberfest2020

BhimIntegers🚀 is a C++ library that is useful when we are dealing with BigIntegers💥💥. We can handle big integers (integers having a size bigger than the long long int data type) and we can perform arithmetic operations📘 like addition, multiplication, subtraction, division, equality check, etc📐📐. Also, there are several functions like factorial, …

Stars: ✭ 43 (+168.75%)

Mutual labels: hacktoberfest2020

grandes-testes-do-buzzfeed

Um repositório para colocar testes icônicos do Buzzfeed para fazermos em belos momentos de tédio ou procrastinação. 📱 Espaço para conhecer e começar a contribuir com o open-source/github. Então sem medo, comece a contribuir com outros repositórios também!

Stars: ✭ 19 (+18.75%)

Mutual labels: hacktoberfest2020

MT4-Telegram-Bot-Recon

Building a Telegram Chat with a MT4 Forex Trading Expert Advisor

Stars: ✭ 71 (+343.75%)

Mutual labels: hacktoberfest2020

CPE Previous Questions

CPE 的歷屆考題

Stars: ✭ 20 (+25%)

Mutual labels: hacktoberfest2020

Automatic-attendance-management-system

ROLLCALL an automatic and smart attendance marking and management system which uses Microsoft Azure’s Cognitive service at its core to create a system that could make sure that no human intervention is required and provides government the ability to monitor the attendance of the schools and helps the government officials in mark fake schools.

Stars: ✭ 44 (+175%)

Mutual labels: hacktoberfest2020

📝🤖 Simple, efficient and most importantly elegant TODO Bot. A virtual TODO List right inside your Discord server!

Stars: ✭ 32 (+100%)

Mutual labels: hacktoberfest2020

Digital Music Looper

Stars: ✭ 64 (+300%)

Mutual labels: hacktoberfest2020

Wondering how to send WhatsApp messages using Python using only 3 lines of code? You have come to the right place!

Stars: ✭ 40 (+150%)

Mutual labels: hacktoberfest2020

just a music player that search your storage and plays the song.

Stars: ✭ 25 (+56.25%)

Mutual labels: hacktoberfest2020

Small beginners python programs.

Stars: ✭ 33 (+106.25%)

Mutual labels: hacktoberfest2020

View All Similar Projects ➔

WebCrawlerTokopedia

It is a web crawler and scrapper for https://www.Tokopedia.com. It is fully automated code where you just need to give input URL to get started.

The program extract the following,

product-ID,
product-URL,
product-videos-URLs

It has fetcher and extractor functions. The strucutre of the webpage is considered and the code is written specifically for that purpose. One need to change the extractor, DoCDP() function to get the required results.

Dependencies

It uses the chromdp package. You can check it here.

Installation

Install it in the usual way.

$ go get -u github.com/yogesh-desai/WebCrawlerTokopedia

Usage

$ go run main.go

Usage of command-line-arguments:
  -cancelafter duration
    	automatically cancel the fetchbot after a given time
  -cancelat string
    	automatically cancel the fetchbot at a given URL
  -headless
    	Run the CDP in headless mode. (default true)
  -memstats duration
    	display memory statistics at a given interval (default 5m0s)
  -seed string
    	seed URL (default "https://www.tokopedia.com/")
  -stopafter duration
    	automatically stop the fetchbot after a given time
  -stopat string
    	automatically stop the fetchbot at a given URL

Output

The code generates a file to store product details.

Following is the example of the code when ran for a single webpage.


Product_ID	Product_URL	Youtube_Video_URLs
146347138	https://www.tokopedia.com/chocoapple/ready-stock-bnib-iphone-128gb-7-plus-jet-black-garansi-apple-1-tahun-10	https://www.youtube.com/watch?v=oKR2fh09Nic,https://www.youtube.com/watch?v=12JBG20n3jI,https://www.youtube.com/watch?v=mWEG1nu2rVY,https://www.youtube.com/watch?v=wgZ7Q4ywOl8

Features

It has fetcher and extractor functions.
The fetcher is specifically designed with Filter function.
It uses goroutines and channels to make tasks parallel and faster.
It has Flags, with bydefault values. You can give your own values at runtime.
It also has the Memory Stats to keep track of memory being used by the program.

ToDOs

Currently, it uses GUI mode of the Google-Chrome. Need to implement the --headless functionality.
Make the code more Faster and stable.
More Testing and profiling to understand Memory related issues.

Known Issues

Currently, no issues. :)

Please feel free to generate pull requests or issues. :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 16

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗