All Projects → tolbertam → Sstable Tools

tolbertam / Sstable Tools

Licence: apache-2.0
Tools for parsing, creating and doing other fun stuff with sstables

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Sstable Tools

Metric Collector For Apache Cassandra
Drop-in metrics collection and dashboards for Apache Cassandra
Stars: ✭ 62 (-57.24%)
Mutual labels:  cassandra
Lua Cassandra
Pure Lua driver for Apache Cassandra
Stars: ✭ 95 (-34.48%)
Mutual labels:  cassandra
Dcos Cassandra Service
DEPRECATED—Open source Apache Cassandra running on DC/OS is now replaced by mesosphere/dcos-commons/frameworks/cassandra. This repository will be deleted at the end of 2017.
Stars: ✭ 116 (-20%)
Mutual labels:  cassandra
Graylog Plugin Metrics Reporter
Graylog Metrics Reporter Plugins
Stars: ✭ 71 (-51.03%)
Mutual labels:  cassandra
Restbase
Distributed storage with REST API & dispatcher for backend services.
Stars: ✭ 91 (-37.24%)
Mutual labels:  cassandra
Longevity
A Persistence Framework for Scala and NoSQL
Stars: ✭ 103 (-28.97%)
Mutual labels:  cassandra
Erlcass
High-Performance Erlang Cassandra driver based on DataStax cpp-driver
Stars: ✭ 59 (-59.31%)
Mutual labels:  cassandra
Cassandra exporter
Apache Cassandra® metrics exporter for Prometheus
Stars: ✭ 133 (-8.28%)
Mutual labels:  cassandra
Cqlkit
CLI tool to export Cassandra query as CSV and JSON format.
Stars: ✭ 94 (-35.17%)
Mutual labels:  cassandra
Cassandra Sharp
high performance .NET driver for Apache Cassandra
Stars: ✭ 114 (-21.38%)
Mutual labels:  cassandra
Bitnami Docker Cassandra
Bitnami Docker Image for Cassandra
Stars: ✭ 73 (-49.66%)
Mutual labels:  cassandra
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-38.62%)
Mutual labels:  cassandra
Haproxy Configs
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-26.9%)
Mutual labels:  cassandra
Vos backend
vangav open source - backend; a backend generator (generates more than 90% of the code needed for big scale backend services)
Stars: ✭ 71 (-51.03%)
Mutual labels:  cassandra
Elassandra
Elassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1010.34%)
Mutual labels:  cassandra
Dropwizard Cassandra
Dropwizard support for Cassandra
Stars: ✭ 60 (-58.62%)
Mutual labels:  cassandra
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-33.1%)
Mutual labels:  cassandra
My Moments
Instagram Clone - Cloning Instagram for learning purpose
Stars: ✭ 140 (-3.45%)
Mutual labels:  cassandra
Iot Traffic Monitor
Stars: ✭ 131 (-9.66%)
Mutual labels:  cassandra
Cassandra Migrate
Simple Cassandra schema migration tool written in Python
Stars: ✭ 114 (-21.38%)
Mutual labels:  cassandra

SSTable Tools

Build Status Github Releases (by Release)

A toolkit for parsing, creating and doing other fun stuff with Cassandra 3.x SSTables. This project is under development and not yet stable. Mainly a proof of concept playground for wish items.

Pre compiled binary available:

Note: This tool formerly included a tojson command which dumped SSTable contents to JSON. This functionality has since been merged into Cassandra starting with versions 3.0.4 and 3.4 as the sstabledump command.

Features:

  • cqlsh - Drop into an interactive shell to make queries against SSTables.
  • describe - Describe SSTable data and metadata. CASSANDRA-11483
  • hints - Dump hints from a hint file
  • compact - offline compaction

Example usage:

java -jar sstable-tools.jar cqlsh
java -jar sstable-tools.jar compact ma-2-big-Data.db ma-3-big-Data.db /path/to/tabledir
java -jar sstable-tools.jar hints 1458779867606-1.hints
java -jar sstable-tools.jar describe ma-2-big-Data.db

Example shell usage:

java -jar sstable-tools.jar cqlsh

Select one or more sstables (space delimited, or choose directory to include all)

cqlsh> use ma-2-big-Data.db;
Using: /home/user/sstable-tools/ma-2-big-Data.db

Use predefined schema file. Can view with 'schema'. This is optional but without it the partition key and clustering index names are unknown.

cqlsh> schema schema.cql
Successfully imported schema from '/home/user/sstable-tools/schema.cql'.

Alternatively, use 'CREATE TABLE' statement to enter a schema.

cqlsh> CREATE TABLE users (
...        user_name varchar PRIMARY KEY,
...        password varchar,
...        gender varchar,
...        state varchar,
...        birth_year bigint
...    );

Discover the data in sstable(s) using CQL queries

cqlsh> SELECT * FROM sstable WHERE age > 1 LIMIT 1

 ┌────────────┬─────────────┬─────────┬───────────┬────────┐
 │user_name   │birth_year   │gender   │password   │state   │
 ╞════════════╪═════════════╪═════════╪═══════════╪════════╡
 │frodo       │1985         │male     │[email protected]      │CA      │
 └────────────┴─────────────┴─────────┴───────────┴────────┘

Display raw sstable data (useful to see tombstones and expired ttls) with optional where clause

cqlsh> DUMP WHERE age > 1 LIMIT 1

[frodo] Row[info=[ts=1455937221199050] ]:  | [birth_year=1985 ts=1455937221199050], [gender=male ts=1455937221199050], [[email protected] ts=1455937221199050], [state=CA ts=1455937221199050]

Describe the sstable data and metadata

cqlsh> describe sstables;

/Users/clohfink/git/sstable-tools/./src/test/resources/ma-2-big-Data.db
=======================================================================
Partitions: 1
Rows: 1
Tombstones: 0
Cells: 4
Widest Partitions:
   [frodo] 1
Largest Partitions:
   [frodo] 104 (104 B)
Tombstone Leaders:
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Size: 50 (50 B)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
  Compression ratio: -1.0
Minimum timestamp: 1455937221199050 (02/19/2016 21:00:21)
Maximum timestamp: 1455937221199050 (02/19/2016 21:00:21)
SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07)
SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07)
TTL min: 0 (0 milliseconds)
...[snip]...

Paging is enabled by default and can be manipulated by using 'PAGING'

cqlsh> PAGING 20;
Now Query paging is enabled
Page size: 20
cqlsh> PAGING OFF;
Disabled Query paging.
cqlsh> PAGING ON;
Now Query paging is enabled
Page size: 20

Used sstables, schema, and paging settings and persisted for future use. Use the 'PERSIST' command to view preferences and to enable/disable persistence.

cqlsh> PERSIST;
Preferences are currently enabled:
pagingEnabled=true
pagingSize=20
preferencesEnabled=true
schema=""
sstables=[
    "/home/user/sstable-tools/ma--big-Data.db"
]
cqlsh> PERSIST OFF;
Disabled Preferences.

Note: No environment configuration is necessary for this tool to work if all components of the sstable are available but the cql create statement allows for more details.

Building

This project uses Apache Maven to build a self-contained executable jar. To build the jar simply execute:

mvn package

The executable jar will be present in the target directory.

cqlsh

cql shell similiar and modeled after the C* cqlsh tool. Enables issuing cql queries against raw sstables and provides additional diagnostic tools over them. Provides history (reverse searchable with ctrl-r) and autocomplete for ease of use.

Commands:

HELP               - prints this message
EXIT               - leaves the shell
CREATE TABLE ...   - A CREATE TABLE cql statement to use as metadata when reading sstables (HIGHLY RECOMMENDED!)
DESCRIBE SCHEMA    - Show currently used schema (or serialized cfmetadata if generated)
DESCRIBE SSTABLES  - Provide details and statistics on current sstable(s)
PAGING [(ON|OFF)]  - Enables, disables, or shows current status of query paging.
PAGING <SIZE>      - Enables paging and sets paging size.
PERSIST [(ON|OFF)] - Enables, disables, or shows current status of persistence of settings state.
SCHEMA [<FILE>]    - Imports a cql file as the active table schema or shows active user-defined schema.
USE                - update the sstable[s] used by default with select, dump, describe commands
    USE /var/lib/cassandra/data/system/peers/ma-1-big-Data.db
    or with multiple sstables separated with spaces. This can also be a directory which will add all sstables in it.
    USE ma-1-big-Data.db ma-2-big-Data.db "/home/path with space/db/data/sstables"

SELECT             - run a cql query against the current sstable (unless other specified)
    SELECT * FROM sstables WHERE key > 1 LIMIT 10
    the keyword sstables will use the current sstable set with the USE command or set when running cqlsh. You can also
    specify an sstable here
    SELECT avg(someColumn) FROM /var/lib/cassandra/data/system/peers/ma-1-big-Data.db WHERE key > 1

DUMP               - dump the raw unfiltered partitions/rows. Useful for debuging TTLed/tombstoned data.
    DUMP;
    Can also specify a where clause for filtering the results.
    DUMP WHERE partitionKey = 'OpsCenter';

cql example (note: slow analyzing artifact of ttygif, describing sstable in this scenario is sub second.)

describe

Provides information about an sstable's data and its metadata. Can be used as argument or via cqlsh.

Example Output:

/Users/clohfink/git/sstable-tools/ma-119-big-Data.db
====================================================
Partitions: 22515                                                               
Rows: 13579337
Tombstones: 0
Cells: 13579337
Widest Partitions:
   [12345] 999999
   [99049] 62664
   [99007] 60437
   [99017] 59728
   [99010] 59555
Largest Partitions:
   [12345] 189888705 (189.9 MB)
   [99049] 2965017 (3.0 MB)
   [99007] 2860391 (2.9 MB)
   [99017] 2826094 (2.8 MB)
   [99010] 2818038 (2.8 MB)
Tombstone Leaders:
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Size: 540932262 (540.9 MB) 
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
  Compression ratio: 0.3068105022732033
Minimum timestamp: 1456554952195298 (02/27/2016 00:35:52)
Maximum timestamp: 1456594562846756 (02/27/2016 11:36:02)
SSTable min local deletion time: 2147483647 (01/18/2038 21:14:07)
SSTable max local deletion time: 2147483647 (01/18/2038 21:14:07)
TTL min: 0 (0 milliseconds)
TTL max: 0 (0 milliseconds)
minClustringValues: [1]
maxClustringValues: [999999]
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0 (12/31/1969 18:00:00)
  Lower bound: ReplayPosition(segmentId=-1, position=0)
  Upper bound: ReplayPosition(segmentId=1456414025108, position=16709244)
totalColumnsSet: 13579337
totalRows: 13579337
Estimated tombstone drop times:
  Value                            | Count      %   Histogram 
  2147483647 (01/18/2038 21:14:07) | 27158674 (100) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
Estimated partition size:
  Value     | Count   %   Histogram 
  258       |     6 (  0)  
  310       |    20 (  0) ▏ 
  372       |    50 (  0) ▍ 
  446       |    45 (  0) ▍ 
  535       |    66 (  0) ▍ 
  642       |    62 (  0) ▍ 
  770       |   120 (  0) ▉ 
  924       |   116 (  0) ▉ 
  1109      |   144 (  0) ▉▏ 
  1331      |   175 (  0) ▉▍ 
  1597      |   210 (  0) ▉▊ 
  1916      |   248 (  1) ▉▉ 
  2299      |   324 (  1) ▉▉▋ 
  2759      |   356 (  1) ▉▉▉ 
  3311      |   460 (  2) ▉▉▉▉ 
  3973      |   534 (  2) ▉▉▉▉▍ 
  4768      |   569 (  2) ▉▉▉▉▊ 
  5722      |   667 (  2) ▉▉▉▉▉▋ 
  6866      |   786 (  3) ▉▉▉▉▉▉▋ 
  8239      |  1002 (  4) ▉▉▉▉▉▉▉▉▍ 
  9887      |  1266 (  5) ▉▉▉▉▉▉▉▉▉▉▋ 
  11864     |  1446 (  6) ▉▉▉▉▉▉▉▉▉▉▉▉▏ 
  14237     |  1779 (  7) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  17084     |  2081 (  9) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▍ 
  20501     |  2598 ( 11) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  24601     |  3089 ( 13) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  29521     |  3555 ( 15) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  35425     |   681 (  3) ▉▉▉▉▉▋ 
  186563160 |     1 (  0)  
Estimated column count:
  Value   | Count   %   Histogram 
  10      |    41 (  0) ▎ 
  12      |    39 (  0) ▎ 
  14      |    32 (  0) ▎ 
  17      |    63 (  0) ▍ 
  20      |    64 (  0) ▍ 
  24      |   102 (  0) ▉ 
  29      |   120 (  0) ▉ 
  35      |   123 (  0) ▉ 
  42      |   165 (  0) ▉▍ 
  50      |   189 (  0) ▉▋ 
  60      |   220 (  0) ▉▉ 
  72      |   313 (  1) ▉▉▋ 
  86      |   326 (  1) ▉▉▊ 
  103     |   429 (  1) ▉▉▉▋ 
  124     |   493 (  2) ▉▉▉▉▎ 
  149     |   566 (  2) ▉▉▉▉▉ 
  179     |   643 (  2) ▉▉▉▉▉▍ 
  215     |   765 (  3) ▉▉▉▉▉▉▋ 
  258     |   949 (  4) ▉▉▉▉▉▉▉▉▏ 
  310     |  1197 (  5) ▉▉▉▉▉▉▉▉▉▉▍ 
  372     |  1382 (  6) ▉▉▉▉▉▉▉▉▉▉▉▉ 
  446     |  1722 (  7) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  535     |  1971 (  8) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  642     |  2469 ( 10) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▍ 
  770     |  2908 ( 12) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▎ 
  924     |  3455 ( 15) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 
  1109    |  1708 (  7) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▊ 
  1131752 |     1 (  0)  
Estimated cardinality: 22873
EncodingStats minTTL: 0 (0 milliseconds)
EncodingStats minLocalDeletionTime: 1442880000 (01/17/1970 10:48:00)
EncodingStats minTimestamp: 1456554952195298 (02/27/2016 00:35:52)
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes: [org.apache.cassandra.db.marshal.UTF8Type]
StaticColumns: {}
RegularColumns: {val:org.apache.cassandra.db.marshal.UTF8Type}

Usage

java -jar sstable-tools.jar describe /path/to/file.db

hints

Deserialize the mutations in a hints file and print them to standard out. To have the information necessary do deserialize the mutations this tool requires the schema of the file. This is currently handled by connecting to the cluster and querying the metadata. You can specify the host and (cql) port via the -h and -p options.

Example Output:

java -jar sstable-tools.jar hints 1458786695234-1.hints
Loading schema from 127.0.0.1:9042
/Users/clohfink/1458786695234-1.hints
=====================================
[test.t1] key=1 columns=[[] | [val]]
    Row: EMPTY | val=1
[[val=1 ts=1458786688691698]]
[test.wide] key=myPartitionKey columns=[[] | [val]]
    Row: key2=myClusteringKey | val=myValue
[[val=myValue ts=1458790233298650]]
[test.wide] key=myPartitionKey columns=[[] | [val]]
    Row: key2=myClusteringKey2 | val=myValue2
[[val=myValue2 ts=1458790238249336]]

Usage

java -jar sstable-tools.jar hints

usage: hints [-h <arg>] [-p <arg>] [-s] hintfile [hintfile ...]

Hint Dump for Apache Cassandra 3.x
Options:
  -h <arg> Host to extract schema frome.
  -p <arg> CQL native port.
  -s       Only output mutations.
  hintfile at least one file containing hints

compact

Merge multiple sstables into a single sstable. Arguments can either be sstables or a directory holding multiple sstables.

Example Output:

java -jar sstable-tools.jar compact /Users/clohfink/node1table/mc-15-big-Data.db /Users/clohfink/node2table
Merging 4 sstables to /Users/clohfink/git/sstable-tools/mc-0-big-Data.db

Usage

java -jar sstable-tools.jar compact

Usage: compact sstable [sstable ...]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].