All Projects → auto-data-tokenize → Similar Projects or Alternatives

107 Open source projects that are alternatives of or similar to auto-data-tokenize

bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+433.33%)
Mutual labels:  dataflow, data-governance
dataflow-contact-center-speech-analysis
Speech Analysis Framework, a collection of components and code from Google Cloud that you can use to transcribe audio files to create analytics.
Stars: ✭ 46 (+119.05%)
Mutual labels:  dataflow, data-loss-prevention
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+300%)
Mutual labels:  tokenization
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+142.86%)
Mutual labels:  tokenization
tkseem
Arabic Tokenization Library. It provides many tokenization algorithms.
Stars: ✭ 45 (+114.29%)
Mutual labels:  tokenization
bigflow
A Python framework for data processing on GCP.
Stars: ✭ 96 (+357.14%)
Mutual labels:  dataflow
terraform-splunk-log-export
Deploy Google Cloud log export to Splunk using Terraform
Stars: ✭ 26 (+23.81%)
Mutual labels:  dataflow
lunasec
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
Stars: ✭ 1,261 (+5904.76%)
Mutual labels:  tokenization
document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Stars: ✭ 36 (+71.43%)
Mutual labels:  data-governance
Meemooapp
Creative apps to use, build, share, and hack in the browser.
Stars: ✭ 220 (+947.62%)
Mutual labels:  dataflow
gotcha
Go Taint CHeck Analyser
Stars: ✭ 40 (+90.48%)
Mutual labels:  dataflow
Azure Services Map
A visual representation and reference to Azure services
Stars: ✭ 189 (+800%)
Mutual labels:  dataflow
DataflowTemplates
Convenient Dataflow pipelines for transforming data between cloud data sources
Stars: ✭ 22 (+4.76%)
Mutual labels:  dataflow
nightfall dlp action
GitHub Data Loss Prevention (DLP) Action: Scan Pull Requests for sensitive data, like credentials & secrets, PII, credit card numbers, and more.
Stars: ✭ 46 (+119.05%)
Mutual labels:  data-loss-prevention
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (+23.81%)
Mutual labels:  tokenization
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (+52.38%)
Mutual labels:  tokenization
github-watchman
Monitoring GitHub for sensitive data shared publicly
Stars: ✭ 60 (+185.71%)
Mutual labels:  data-loss-prevention
bert tokenization for java
This is a java version of Chinese tokenization descried in BERT.
Stars: ✭ 39 (+85.71%)
Mutual labels:  tokenization
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (+257.14%)
Mutual labels:  tokenization
data-lineage
Generate and Visualize Data Lineage from query history
Stars: ✭ 166 (+690.48%)
Mutual labels:  data-governance
Chigraph
A visual systems language for beginners compiled using LLVM
Stars: ✭ 247 (+1076.19%)
Mutual labels:  dataflow
joern
Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs
Stars: ✭ 968 (+4509.52%)
Mutual labels:  dataflow
Microflo
Live dataflow programming for microcontrollers and embedded
Stars: ✭ 207 (+885.71%)
Mutual labels:  dataflow
polycash
The ultimate open source betting protocol. PolyCash is a P2P blockchain platform for wallets, asset issuance, bonds & gaming.
Stars: ✭ 24 (+14.29%)
Mutual labels:  tokenization
Pyt
A Static Analysis Tool for Detecting Security Vulnerabilities in Python Web Applications
Stars: ✭ 2,061 (+9714.29%)
Mutual labels:  dataflow
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+90.48%)
Mutual labels:  deidentification
Blocks.js
JavaScript dataflow graph editor
Stars: ✭ 165 (+685.71%)
Mutual labels:  dataflow
datacatalog-tag-manager
Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
Stars: ✭ 17 (-19.05%)
Mutual labels:  data-governance
Data-Stash
Data-Stash是基于FISCO-BCOS的数据仓库组件,通过解析节点的binlog日志,生成该节点状态的全量备份,从而使节点能够实现冷热数据分离和数据裁剪。
Stars: ✭ 27 (+28.57%)
Mutual labels:  data-governance
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+228.57%)
Mutual labels:  tokenization
systolic-array-dataflow-optimizer
A general framework for optimizing DNN dataflow on systolic array
Stars: ✭ 21 (+0%)
Mutual labels:  dataflow
FAT
Factom Asset Tokens - Open tokenization standards on Factom
Stars: ✭ 17 (-19.05%)
Mutual labels:  tokenization
yarr
Yer another array library
Stars: ✭ 42 (+100%)
Mutual labels:  dataflow
wb-toolbox
Simulink toolbox to rapidly prototype robot controllers
Stars: ✭ 20 (-4.76%)
Mutual labels:  dataflow
PothosComms
Communications blocks and support libraries
Stars: ✭ 15 (-28.57%)
Mutual labels:  dataflow
ObservableComputations
Cross-platform .NET library for computations whose arguments and results are objects that implement INotifyPropertyChanged and INotifyCollectionChanged (ObservableCollection) interfaces.
Stars: ✭ 94 (+347.62%)
Mutual labels:  dataflow
sqllineage
SQL Lineage Analysis Tool powered by Python
Stars: ✭ 348 (+1557.14%)
Mutual labels:  data-governance
Embb
Embedded Multicore Building Blocks (EMB²): Library for parallel programming of embedded systems. Star us on GitHub? +1
Stars: ✭ 153 (+628.57%)
Mutual labels:  dataflow
re-view
Tools for building reactive user interfaces in ClojureScript.
Stars: ✭ 40 (+90.48%)
Mutual labels:  dataflow
flowgraph
Flowgraph package for scalable asynchronous system development
Stars: ✭ 51 (+142.86%)
Mutual labels:  dataflow
Vaaku2Vec
Language Modeling and Text Classification in Malayalam Language using ULMFiT
Stars: ✭ 68 (+223.81%)
Mutual labels:  tokenization
act
ACT hardware description language and core tools.
Stars: ✭ 53 (+152.38%)
Mutual labels:  dataflow
Dnai.Editor
Dnai Editor - Visual Scripting (Node Editor)
Stars: ✭ 117 (+457.14%)
Mutual labels:  dataflow
whoshiring
A browser for Hacker News's Ask HN: Who's Hiring, with Matrix Inside(tm)
Stars: ✭ 24 (+14.29%)
Mutual labels:  dataflow
Pothoscore
The Pothos data-flow framework
Stars: ✭ 232 (+1004.76%)
Mutual labels:  dataflow
datasphere-service
an open source dataworks platform
Stars: ✭ 20 (-4.76%)
Mutual labels:  data-governance
Pythonflow
🐍 Dataflow programming for python.
Stars: ✭ 215 (+923.81%)
Mutual labels:  dataflow
spacy-server
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
Stars: ✭ 58 (+176.19%)
Mutual labels:  tokenization
Vue Blocks
Vue2 dataflow graph editor
Stars: ✭ 201 (+857.14%)
Mutual labels:  dataflow
redis-dataflow-realtime-analytics
Build a real-time website analytics dashboard on GCP using Dataflow, Cloud Memorystore (Redis) and Spring Boot
Stars: ✭ 20 (-4.76%)
Mutual labels:  dataflow
Scio
A Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+10600%)
Mutual labels:  dataflow
Data-Export
Data-Export支持将链上数据导出到MySQL、ES等便于进行大数据处理的存储介质中,解决区块链数据复杂查询、分析、可视化和处理的问题。
Stars: ✭ 37 (+76.19%)
Mutual labels:  data-governance
Nipyapi
A convenient Python wrapper for Apache NiFi
Stars: ✭ 169 (+704.76%)
Mutual labels:  dataflow
spacy russian tokenizer
Custom Russian tokenizer for spaCy
Stars: ✭ 35 (+66.67%)
Mutual labels:  tokenization
ling
Natural Language Processing Toolkit in Golang
Stars: ✭ 57 (+171.43%)
Mutual labels:  tokenization
pyroclastic
Functional dataflow through composable computations
Stars: ✭ 17 (-19.05%)
Mutual labels:  dataflow
dspatch
The Refreshingly Simple Cross-Platform C++ Dataflow / Pipelining / Stream Processing / Reactive Programming Framework
Stars: ✭ 124 (+490.48%)
Mutual labels:  dataflow
PothosDemos
Pothos demonstration applications
Stars: ✭ 24 (+14.29%)
Mutual labels:  dataflow
dtask
DTask is a scheduler for statically dependent tasks.
Stars: ✭ 17 (-19.05%)
Mutual labels:  dataflow
Data-Reconcile
Data-Reconcile是一款基于区块链的对账组件,提供基于区块链智能合约账本的通用化数据对账解决方案,并提供了一套可动态扩展的对账框架,支持定制化开发。
Stars: ✭ 24 (+14.29%)
Mutual labels:  data-governance
1-60 of 107 similar projects