All Projects → Schemer → Similar Projects or Alternatives

1760 Open source projects that are alternatives of or similar to Schemer

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-40.21%)

Mutual labels: json, spark, avro, parquet

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+318.56%)

Mutual labels: json, spark, avro, parquet

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+152.58%)

Mutual labels: json, avro, parquet

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+305.15%)

Mutual labels: spark, avro, parquet

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+283.51%)

Mutual labels: json, avro, parquet

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-71.13%)

Mutual labels: tsv, avro

Structured Text Tools

A list of command line tools for manipulating structured text data

Stars: ✭ 6,180 (+6271.13%)

Mutual labels: json, tsv

Sqlitebiter

A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.

Stars: ✭ 601 (+519.59%)

Mutual labels: json, tsv

Record Query - A tool for doing record analysis and transformation

Stars: ✭ 1,808 (+1763.92%)

Mutual labels: json, avro

Kafka Connect Mongodb

**Unofficial / Community** Kafka Connect MongoDB Sink Connector - Find the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector

Stars: ✭ 137 (+41.24%)

Mutual labels: json, avro

Miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Stars: ✭ 4,633 (+4676.29%)

Mutual labels: json, tsv

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+351.55%)

Mutual labels: tsv, parquet

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-83.51%)

Mutual labels: spark, parquet

experiments

Code examples for my blog posts

Stars: ✭ 21 (-78.35%)

Mutual labels: spark, parquet

parquet-extra

A collection of Apache Parquet add-on modules

Stars: ✭ 30 (-69.07%)

Mutual labels: avro, parquet

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (-71.13%)

Mutual labels: avro, parquet

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+1592.78%)

Mutual labels: spark, parquet

Abris

Avro SerDe for Apache Spark structured APIs.

Stars: ✭ 130 (+34.02%)

Mutual labels: spark, avro

Pxi

🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.

Stars: ✭ 248 (+155.67%)

Mutual labels: json, tsv

Schema Registry

Confluent Schema Registry for Kafka

Stars: ✭ 1,647 (+1597.94%)

Mutual labels: json, avro

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-67.01%)

Mutual labels: spark, avro

confluent-spark-avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Stars: ✭ 18 (-81.44%)

Mutual labels: spark, avro

Sqawk

Like Awk but with SQL and table joins

Stars: ✭ 263 (+171.13%)

Mutual labels: json, tsv

Ratatool

A tool for data sampling, data generation, and data diffing

Stars: ✭ 279 (+187.63%)

Mutual labels: avro, parquet

swiss-army knife for data

Stars: ✭ 275 (+183.51%)

Mutual labels: json, tsv

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (+253.61%)

Mutual labels: spark, parquet

Noproto

Flexible, Fast & Compact Serialization with RPC

Stars: ✭ 138 (+42.27%)

Mutual labels: json, avro

Pmacct

pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].

Stars: ✭ 677 (+597.94%)

Mutual labels: json, avro

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-75.26%)

Mutual labels: avro, parquet

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (+139.18%)

Mutual labels: json, avro

Elasticsearch loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Stars: ✭ 300 (+209.28%)

Mutual labels: json, parquet

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+650.52%)

Mutual labels: spark, avro

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (-70.1%)

Mutual labels: spark, parquet

Pytablewriter

pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

Stars: ✭ 422 (+335.05%)

Mutual labels: json, tsv

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (+12.37%)

Mutual labels: spark, parquet

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+82.47%)

Mutual labels: avro, parquet

parquet-flinktacular

How to use Parquet in Flink

Stars: ✭ 29 (-70.1%)

Mutual labels: avro, parquet

Visidata

A terminal spreadsheet multitool for discovering and arranging data

Stars: ✭ 4,606 (+4648.45%)

Mutual labels: json, tsv

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (-41.24%)

Mutual labels: avro, parquet

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-11.34%)

Mutual labels: avro, parquet

Jsonmapper

Map nested JSON structures onto PHP classes

Stars: ✭ 1,306 (+1246.39%)

Mutual labels: json

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-5.15%)

Mutual labels: spark

Tabtoy

高性能表格数据导出器