All Projects → indix → Schemer

indix / Schemer

Licence: apache-2.0
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Schemer

Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-40.21%)
Mutual labels:  json, spark, avro, parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+318.56%)
Mutual labels:  json, spark, avro, parquet
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+305.15%)
Mutual labels:  spark, avro, parquet
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+283.51%)
Mutual labels:  json, avro, parquet
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+152.58%)
Mutual labels:  json, avro, parquet
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+253.61%)
Mutual labels:  spark, parquet
Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+187.63%)
Mutual labels:  avro, parquet
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+4648.45%)
Mutual labels:  json, tsv
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+335.05%)
Mutual labels:  json, tsv
Sqlitebiter
A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
Stars: ✭ 601 (+519.59%)
Mutual labels:  json, tsv
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+6271.13%)
Mutual labels:  json, tsv
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+209.28%)
Mutual labels:  json, parquet
Sq
swiss-army knife for data
Stars: ✭ 275 (+183.51%)
Mutual labels:  json, tsv
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-83.51%)
Mutual labels:  spark, parquet
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+650.52%)
Mutual labels:  spark, avro
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (-70.1%)
Mutual labels:  spark, parquet
Sqawk
Like Awk but with SQL and table joins
Stars: ✭ 263 (+171.13%)
Mutual labels:  json, tsv
confluent-spark-avro
Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-81.44%)
Mutual labels:  spark, avro
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-71.13%)
Mutual labels:  tsv, avro
Pmacct
pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
Stars: ✭ 677 (+597.94%)
Mutual labels:  json, avro

schemer

Build Status Maven Docker Pulls

Schema registry with support for CSV, TSV, AVRO, JSON and Parquet. Has ability to infer schema from a given data source.

Schemer UI [WIP]

Schemer UI is the wizard based frontend for Schemer. It provides a wizard based schema creation and versioning workflow apart from browsing and search capabilities. It is a work in progress. More screens

Schemer Core

schemer-core is the core library that implements most of the logic needed to understand the supported schema types along with the schema inference. To use schemer-core directly, just add it to your dependencies:

libraryDependencies += "com.indix" %% "schemer" % "v0.2.3"

Schemer Registry

schemer-registry is a schema registry for storing the metadata about schema and schema versions. It provides a GraphQL API for adding, viewing and inferring schemas.

Schemer Registry is available as a docker image at DockeHub

Running Locally

Local docker based PostgreSQL can be run as follows:

docker run -e POSTGRES_USER=schemer -e POSTGRES_PASSWORD=schemer -e PGDATA=/var/lib/postgresql/data/pgdata -e POSTGRES_DB=schemer -v $(pwd)/schemer_db:/var/lib/postgresql/data/pgdata -p 5432:5432 postgres:9.5.0

Remove schmer_db folder to clear all data and start from scratch.

The registry service can be run using sbt:

sbt "project registry" ~reStart
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].