Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+1419.63%)
Cluecorpus2020Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-36.53%)
datasetsTFDS data loaders for sign language datasets.
Stars: ✭ 17 (-96.12%)
Medical Datasetstracking medical datasets, with a focus on medical imaging
Stars: ✭ 296 (-32.42%)
covid-19-data-cleanupScripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Stars: ✭ 25 (-94.29%)
open-discourseOpen Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-89.27%)
LtpLanguage Technology Platform
Stars: ✭ 3,648 (+732.88%)
WordlessAn Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (-13.7%)
fastmorphFast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-96.8%)
Open3d MlAn extension of Open3D to address 3D Machine Learning tasks
Stars: ✭ 284 (-35.16%)
podiumPodium: a framework agnostic Python NLP library for data loading and preprocessing
Stars: ✭ 55 (-87.44%)
PaperrobotCode for PaperRobot: Incremental Draft Generation of Scientific Ideas
Stars: ✭ 372 (-15.07%)
Thulac JavaAn Efficient Lexical Analyzer for Chinese
Stars: ✭ 285 (-34.93%)
disent🧶 Modular VAE disentanglement framework for python built with PyTorch Lightning ▸ Including metrics and datasets ▸ With strongly supervised, weakly supervised and unsupervised methods ▸ Easily configured and run with Hydra config ▸ Inspired by disentanglement_lib
Stars: ✭ 41 (-90.64%)
DeepSentiPersRepository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-96.12%)
MeglassAn eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Stars: ✭ 281 (-35.84%)
dplace-dataThe data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)
Stars: ✭ 49 (-88.81%)
download audioset📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (-87.9%)
newsletter-archiveMarkdown archive & RSS/Atom feeds for Data Is Plural.
Stars: ✭ 65 (-85.16%)
Indian ParallelCorpusCurated list of publicly available parallel corpus for Indian Languages
Stars: ✭ 23 (-94.75%)
Chineseaddress ocrPhotographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-29.45%)
wordfish-pythonextract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-95.66%)
Projects🪐 End-to-end NLP workflows from prototype to production
Stars: ✭ 397 (-9.36%)
NetEmb-DatasetsA collection of real-world networks/graphs for Network Embedding
Stars: ✭ 18 (-95.89%)
CleoraCleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Stars: ✭ 303 (-30.82%)
FuzzdataFuzzing resources for feeding various fuzzers with input. 🔧
Stars: ✭ 376 (-14.16%)
recurrent-defocus-deblurring-synth-dual-pixelReference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…
Stars: ✭ 30 (-93.15%)
TdcTherapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (-33.56%)
Zhparserzhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (-4.57%)
TSForecastingThis repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.
Stars: ✭ 53 (-87.9%)
Dr.sure🏫DeepLearning学习笔记以及Tensorflow、Pytorch的使用心得笔记。Dr. Sure会不定时往项目中添加他看到的最新的技术,欢迎批评指正。
Stars: ✭ 365 (-16.67%)
opendatasetsA Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Stars: ✭ 161 (-63.24%)
dialogue-datasetscollect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-94.52%)
Animal MattingGithub repository for the paper End-to-end Animal Image Matting
Stars: ✭ 363 (-17.12%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-94.98%)
KorporaKorean corpus repository
Stars: ✭ 270 (-38.36%)
ml4seA curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
Stars: ✭ 46 (-89.5%)
Awesome Holistic 3dA list of papers and resources (data,code,etc) for holistic 3D reconstruction in computer vision
Stars: ✭ 387 (-11.64%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+813.93%)
SpiCE-CorpusAn open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (-92.47%)
databrewerThe missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
Stars: ✭ 39 (-91.1%)
ChakinSimple downloader for pre-trained word vectors
Stars: ✭ 323 (-26.26%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-42.24%)
ck-envCK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:
Stars: ✭ 67 (-84.7%)
RData.jlRead R data files from Julia
Stars: ✭ 49 (-88.81%)