Data ScienceCollection of useful data science topics along with code and articles
Stars: ✭ 315 (+10.92%)
MtntCode for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-83.1%)
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-92.25%)
Lingua Rs👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
Stars: ✭ 260 (-8.45%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-91.55%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-76.06%)
Tacred RelationPyTorch implementation of the position-aware attention model for relation extraction
Stars: ✭ 271 (-4.58%)
BluebertBlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).
Stars: ✭ 273 (-3.87%)
scrapy-zyte-smartproxyZyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Stars: ✭ 317 (+11.62%)
Bist ParserGraph-based and Transition-based dependency parsers based on BiLSTMs
Stars: ✭ 257 (-9.51%)
dmi-instascraperA GUI for Instaloader to scrape users and hashtags with on Instagram
Stars: ✭ 21 (-92.61%)
facebook-discussion-tkA collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.
Stars: ✭ 33 (-88.38%)
Chatbot nerchatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (-3.87%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-92.25%)
AdaptnlpAn easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.
Stars: ✭ 278 (-2.11%)
Awesome Ai AwesomenessA curated list of awesome awesomeness about artificial intelligence
Stars: ✭ 268 (-5.63%)
BablerData Collection System For NLP/Speech Recognition
Stars: ✭ 21 (-92.61%)
LanguagecrunchLanguageCrunch NLP server docker image
Stars: ✭ 281 (-1.06%)
Matterport3dsimulatorAI Research Platform for Reinforcement Learning from Real Panoramic Images.
Stars: ✭ 260 (-8.45%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (-78.52%)
Nlp tasksNatural Language Processing Tasks and References
Stars: ✭ 2,968 (+945.07%)
chirpsTwitter bot powering @arichduvet
Stars: ✭ 35 (-87.68%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-10.21%)
Scraper-Projects🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-91.2%)
instagram explorer📷 An app to scrap instagram posts and analyze data.
Stars: ✭ 17 (-94.01%)
Nlp TutorialTutorial: Natural Language Processing in Python
Stars: ✭ 274 (-3.52%)
jazzThe Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (-53.52%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-79.23%)
Olivia💁♀️Your new best friend powered by an artificial neural network
Stars: ✭ 3,114 (+996.48%)
scraperNodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (-86.97%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-94.01%)
AwesomefakenewsThis repository contains recent research on fake news.
Stars: ✭ 270 (-4.93%)
webdextIntelligent Web Data Extractor
Stars: ✭ 75 (-73.59%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-2.46%)
PyLexPerform lexical analysis on words, one word at a time.
Stars: ✭ 60 (-78.87%)
NlpythonThis repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-6.69%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-95.07%)
Link GrammarThe CMU Link Grammar natural language parser
Stars: ✭ 286 (+0.7%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-94.72%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+1010.56%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (-72.54%)
PyswipPySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.
Stars: ✭ 276 (-2.82%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-93.31%)
LdaLDA topic modeling for node.js
Stars: ✭ 262 (-7.75%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-92.25%)
LambdasoupFunctional HTML scraping and rewriting with CSS in OCaml
Stars: ✭ 280 (-1.41%)
shupA POSIX shell script to parse HTML
Stars: ✭ 28 (-90.14%)
Ai Job NotesAI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
Stars: ✭ 3,191 (+1023.59%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (-86.62%)
Autonlp🤗 AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically
Stars: ✭ 263 (-7.39%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (-89.44%)
ArticutapiAPI of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-11.27%)
Textractextract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+1014.44%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-0.35%)
SwemThe Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Stars: ✭ 279 (-1.76%)