Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-40.21%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+318.56%)
Vscode Data PreviewData Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+152.58%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+305.15%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+283.51%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-71.13%)
Structured Text ToolsA list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+6271.13%)
SqlitebiterA CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
Stars: ✭ 601 (+519.59%)
RqRecord Query - A tool for doing record analysis and transformation
Stars: ✭ 1,808 (+1763.92%)
Kafka Connect Mongodb**Unofficial / Community** Kafka Connect MongoDB Sink Connector - Find the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Stars: ✭ 137 (+41.24%)
MillerMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+4676.29%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+351.55%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-78.35%)
parquet-extraA collection of Apache Parquet add-on modules
Stars: ✭ 30 (-69.07%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (-71.13%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1592.78%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (+34.02%)
Pxi🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Stars: ✭ 248 (+155.67%)
Schema RegistryConfluent Schema Registry for Kafka
Stars: ✭ 1,647 (+1597.94%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-67.01%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-81.44%)
SqawkLike Awk but with SQL and table joins
Stars: ✭ 263 (+171.13%)
RatatoolA tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+187.63%)
Sqswiss-army knife for data
Stars: ✭ 275 (+183.51%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+253.61%)
NoprotoFlexible, Fast & Compact Serialization with RPC
Stars: ✭ 138 (+42.27%)
Pmacctpmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
Stars: ✭ 677 (+597.94%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-75.26%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+139.18%)
Elasticsearch loaderA tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+209.28%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+650.52%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-70.1%)
Pytablewriterpytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+335.05%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (+12.37%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+82.47%)
VisidataA terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+4648.45%)
Gcs ToolsGCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (-41.24%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-11.34%)
JsonmapperMap nested JSON structures onto PHP classes
Stars: ✭ 1,306 (+1246.39%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-5.15%)
Tabtoy高性能表格数据导出器
Stars: ✭ 1,302 (+1242.27%)
CatjDisplays JSON files in a flat format.
Stars: ✭ 1,301 (+1241.24%)
TankaFlexible, reusable and concise configuration for Kubernetes
Stars: ✭ 1,299 (+1239.18%)
Night ConfigPowerful java configuration library for toml, yaml, hocon, json and in-memory configurations
Stars: ✭ 93 (-4.12%)
KsonGson TypeAdapter & Factory generator for Kotlin data classes
Stars: ✭ 90 (-7.22%)
Simdjson phpsimdjson_php bindings for the simdjson project. https://github.com/lemire/simdjson
Stars: ✭ 90 (-7.22%)
Generic Json SwiftA simple Swift library for working with generic JSON structures
Stars: ✭ 95 (-2.06%)
ImportjsonapiUse JSONPath to selectively extract data from any JSON or GraphQL API directly into Google Sheets.
Stars: ✭ 90 (-7.22%)
SummitdbIn-memory NoSQL database with ACID transactions, Raft consensus, and Redis API
Stars: ✭ 1,295 (+1235.05%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-4.12%)
BitsofbytesCode and projects from my blog posts.
Stars: ✭ 89 (-8.25%)
Redisjson PyAn extension to redis-py for using Redis' ReJSON module
Stars: ✭ 89 (-8.25%)
JsonmaskingReplace fields in json, replacing by something, don't care if property is in depth objects. Very useful to replace passwords credit card number, etc.
Stars: ✭ 95 (-2.06%)