finetune-gpt2xlGuide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
Stars: ✭ 353 (+1861.11%)
TabFormerCode & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Stars: ✭ 209 (+1061.11%)
ReaderExtract clean(er), readable text from web pages via Mercury Web Parser.
Stars: ✭ 75 (+316.67%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3850%)
Transformer Temporal TaggerCode and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (+205.56%)
BertvizTool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+19027.78%)
DatasketchMinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+8983.33%)
capeContinuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (+61.11%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+6066.67%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (+288.89%)
t5-japaneseCodes to pre-train Japanese T5 models
Stars: ✭ 39 (+116.67%)
HiA Programming language for Web Scraping
Stars: ✭ 14 (-22.22%)
dedupsqlfsDeduplicating filesystem via Python3, FUSE and SQLite
Stars: ✭ 24 (+33.33%)
query-selectorLONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION
Stars: ✭ 63 (+250%)
vietnamese-robertaA Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (+22.22%)
Cross-lingual-SummarizationZero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention
Stars: ✭ 28 (+55.56%)
readabilityFast readability scores for text data
Stars: ✭ 22 (+22.22%)
R-MeNTransformer-based Memory Networks for Knowledge Graph Embeddings (ACL 2020) (Pytorch and Tensorflow)
Stars: ✭ 74 (+311.11%)
kaggle-champsCode for the CHAMPS Predicting Molecular Properties Kaggle competition
Stars: ✭ 49 (+172.22%)
lopezCrawling and scraping the Web for fun and profit
Stars: ✭ 20 (+11.11%)
web-poetWeb scraping Page Objects core library
Stars: ✭ 67 (+272.22%)
machine learning courseArtificial intelligence/machine learning course at UCF in Spring 2020 (Fall 2019 and Spring 2019)
Stars: ✭ 47 (+161.11%)
FrostA backup program that does deduplication, compression, encryption
Stars: ✭ 25 (+38.89%)
bytekitJava 字节操作的工具库(不是字节码的工具库)
Stars: ✭ 40 (+122.22%)
deduplicationFast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
Stars: ✭ 59 (+227.78%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+216.67%)
MASTER-pytorchCode for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)
Stars: ✭ 263 (+1361.11%)
lshLocality Sensitive Hashing for Go (Multi-probe LSH, LSH Forest, basic LSH)
Stars: ✭ 92 (+411.11%)
libaiLiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (+1477.78%)
VideoTransformer-pytorchPyTorch implementation of a collections of scalable Video Transformer Benchmarks.
Stars: ✭ 159 (+783.33%)
SegSwap(CVPRW 2022) Learning Co-segmentation by Segment Swapping for Retrieval and Discovery
Stars: ✭ 46 (+155.56%)
densecapDense video captioning in PyTorch
Stars: ✭ 37 (+105.56%)
TokenLabelingPytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Stars: ✭ 385 (+2038.89%)
BMTSource code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Stars: ✭ 192 (+966.67%)
2017-summer-workshopExercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (+83.33%)
sb-nmtCode for Synchronous Bidirectional Neural Machine Translation (SB-NMT)
Stars: ✭ 66 (+266.67%)
php-serializerSerialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (+161.11%)
mxnet-retrainCreate mxnet finetuner (retrain) for mac/linux ,no need install docker and supports CPU, GPU(eGpu/cudnn).support the inception,resnet ,squeeznet,mobilenet...
Stars: ✭ 32 (+77.78%)
LibFewShotLibFewShot: A Comprehensive Library for Few-shot Learning.
Stars: ✭ 629 (+3394.44%)
codepen-puppeteerUse Puppeteer to download pens from Codepen.io as single html pages
Stars: ✭ 22 (+22.22%)
sparql-transformerA more handy way to use SPARQL data in your web app
Stars: ✭ 38 (+111.11%)
ru-dalleGenerate images from texts. In Russian
Stars: ✭ 1,606 (+8822.22%)
nomenklaturaFramework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Stars: ✭ 158 (+777.78%)
fastT5⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+2238.89%)
Transformers-RLAn easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"
Stars: ✭ 107 (+494.44%)
seq2seq-pytorchSequence to Sequence Models in PyTorch
Stars: ✭ 41 (+127.78%)
sisterSImple SenTence EmbeddeR
Stars: ✭ 66 (+266.67%)