All Projects → latacora → Wernicke

latacora / Wernicke

Licence: epl-2.0
Redaction for structured data

Programming Languages

clojure
4091 projects

Labels

Projects that are alternatives of or similar to Wernicke

Jet
CLI to transform between JSON, EDN and Transit, powered with a minimal query language.
Stars: ✭ 331 (+231%)
Mutual labels:  json, edn
Muuntaja
Clojure library for fast http api format negotiation, encoding and decoding.
Stars: ✭ 304 (+204%)
Mutual labels:  json, edn
Awesome Resume For Chinese
📄 适合中文的简历模板收集(LaTeX,HTML/JS and so on)由 @hoochanlon 维护
Stars: ✭ 1,324 (+1224%)
Mutual labels:  json
Reddit Bot
🤖 Making a Reddit Bot using Python, Heroku and Heroku Postgres.
Stars: ✭ 99 (-1%)
Mutual labels:  json
Undictify
Python library providing type-checked function calls at runtime
Stars: ✭ 97 (-3%)
Mutual labels:  json
Generic Json Swift
A simple Swift library for working with generic JSON structures
Stars: ✭ 95 (-5%)
Mutual labels:  json
Kaizen Openapi Editor
Eclipse Editor for the Swagger-OpenAPI Description Language
Stars: ✭ 97 (-3%)
Mutual labels:  json
Swagger Merger
🔗 Merge multiple swagger files into a swagger file, support JSON/YAML.
Stars: ✭ 94 (-6%)
Mutual labels:  json
Parse Google Docs Json
Authenticates with Google API and parse Google Docs to JSON or Markdown
Stars: ✭ 100 (+0%)
Mutual labels:  json
Json4s
A single AST to be used by other scala json libraries
Stars: ✭ 1,341 (+1241%)
Mutual labels:  json
Rpc.py
A fast and powerful RPC framework based on ASGI/WSGI.
Stars: ✭ 98 (-2%)
Mutual labels:  json
Pysmi
SNMP MIB parser
Stars: ✭ 96 (-4%)
Mutual labels:  json
Play Circe
circe for play
Stars: ✭ 95 (-5%)
Mutual labels:  json
Fast Serialization
FST: fast java serialization drop in-replacement
Stars: ✭ 1,348 (+1248%)
Mutual labels:  json
Jsonmasking
Replace fields in json, replacing by something, don't care if property is in depth objects. Very useful to replace passwords credit card number, etc.
Stars: ✭ 95 (-5%)
Mutual labels:  json
Crawlerpack
Java 網路資料爬蟲包
Stars: ✭ 99 (-1%)
Mutual labels:  json
Swurg
Parse OpenAPI documents into Burp Suite for automating OpenAPI-based APIs security assessments (approved by PortSwigger for inclusion in their official BApp Store).
Stars: ✭ 94 (-6%)
Mutual labels:  json
Php Jsondb
A PHP Class that reads JSON file as a database. Use for sample DBs
Stars: ✭ 96 (-4%)
Mutual labels:  json
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-3%)
Mutual labels:  json
Rki Covid Api
🦠🇩🇪📈 An API for the spread of covid-19 in Germany. Data from Robert-Koch-Institut.
Stars: ✭ 98 (-2%)
Mutual labels:  json

wernicke

Carl Wernicke

CI

A redaction tool for structured data. Run wernicke with JSON on stdin, get redacted values out. Preserves structure and (to some extent) semantics. You might want this because you have test data where the actual values are sensitive. Because the changes are consistent within the data and the overall data structure is preserved, there a better chance your data will stay suitable for testing, even though it's been scrubbed.

Most people run wernicke on a shell, so you either have json_producing_thing | wernicke or wernicke < some_file.json > redacted.json. EDN is also supported. See wernicke --help for additional information.

Example input Example output
IPs, MAC addresses, timestamps, various AWS identifiers, and a few other types of strings are redacted to strings of the same type: IPs to IPs, SGs to SGs, et cetera. If these strings have an alphanumeric id, that id will have the same length.
{
  "long_val": "ABBBAAAABBBBAAABBBAABB",
  "ip": "10.0.0.1",
  "mac": "ff:ff:ff:ff:ff:ff",
  "timestamp": "2017-01-01T12:34:56.000Z",
  "ec2": "ip-10-0-0-1.ec2.internal",
  "security_group": "sg-12345",
  "vpc": "vpc-abcdef",
  "aws_access_key": "AKIAXXXXXXXXXXXXXXXX",
  "aws_role_cred": "AROAYYYYYYYYYYYYYYYY"
}
{
  "long_val": "teyjdaeqEYGw18fRIt5vLo",
  "ip": "254.65.252.245",
  "mac": "aa:3e:91🆎3b:3a",
  "timestamp": "2044-19-02T20:32:55.72Z",
  "ec2": "ip-207-255-185-237.ec2.internal",
  "security_group": "sg-887b8",
  "vpc": "vpc-a9d96a",
  "aws_access_key": "AKIAQ5E7IHRMOW7YABLS",
  "aws_role_cred": "AROA6QA7SQTM6YWS4F0H"
}
Redaction happens in arbitrarily nested structures.
{
  "a": {
    "b": [
      "c",
      "d",
      {
        "e": "10.0.0.1"
      }
    ]
  }
}
{
  "a": {
    "b": [
      "c",
      "d",
      {
        "e": "1.212.241.246"
      }
    ]
  }
}
In addition to values in the tree, keys are also redacted, even nested ones.
{
  "vpc-12345": {
    "sg-abcdef": {
      "instance_count": 5
    }
  }
}
{
  "vpc-ec60f": {
    "sg-086fd3": {
      "instance_count": 5
    }
  }
}
Redaction also happens in the middle of strings.
{
  "x": "i-abc123 is in sg-12345"
}
{
  "x": "i-26a1bf is in sg-77aff"
}
The redacted values will change across runs (this is necessary to make redaction irreversible).
{
  "ip": "10.0.0.1",
  "mac": "ff:ff:ff:ff:ff:ff"
}
{
  "ip": "246.220.253.214",
  "mac": "dc:08:90:75:e3:91"
}
Redacted values _are_ consistent within runs. If the input contains the same value multiple times it will get redacted identically. This allows you to still do correlation in the result.
{
  "ip": "10.0.0.1",
  "also_ip": "10.0.0.1"
}
{
  "ip": "247.226.167.9",
  "also_ip": "247.226.167.9"
}

(These examples were pretty-printed for viewing comfort, but wernicke does not do that for you. Try jq.)

Installation

Download from https://github.com/latacora/wernicke/releases

Configuration

We try to do something reasonable for most use cases. If you have a generally useful redactions, please consider contributing them. However, sometimes redaction behavior really does need to be configured. Pass an EDN literal on the command line like so: wernicke --config '{:some-rules "detailed below"}'.

Right now this requires a pretty extensive understanding of how wernicke works--we want to make this more accessible, though! If there's a specific thing you want to accomplish, feel free to write a ticket.

Adding extra rules

For example, to redact all numbers, add the following structure to your EDN:

{:extra-rules
  [{:name :numbers
    :type :regex
    :pattern "\\d*"}]}

The extra rules will be compiled before use, so e.g. you do not need to specify the parsed regex structure for this to work.

Disabling rules by name

Add the following structure to your EDN:

{:disabled-rules [:latacora.wernicke.patterns/arn-re]}

This still requires you to know what the rule names are. You can find these in latacora.wernicke.core/default-config.

Development

To run the project directly from a source checkout:

$ clj -m latacora.wernicke.cli

To run the project's tests:

$ clj -A:test

To build a native image:

$ clj -A:native-image

(This requires GraalVM to be installed with SubstrateVM, and the GRAAL_HOME environment variable to be set.)

Namesake

Named after Carl Wernicke, a German physician who did research on the brain. Wernicke's aphasia is a condition where patients demonstrate fluent speech with intact syntax but with nonsense words. This tool is kind of like that: the resulting structure is maintained but all the words are swapped out with (internally consistent) nonsense.

License

Copyright © Latacora, LLC

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].