All Projects → birchb1024 → frangipanni

birchb1024 / frangipanni

Licence: MIT license
Program to convert lines of text into a tree structure.

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
lua
6591 projects

Projects that are alternatives of or similar to frangipanni

Laravel Tournaments
Laravel Package that allows you to generate customizable tournaments trees.
Stars: ✭ 196 (-83.33%)
Mutual labels:  tree-structure
ng-treetable
A treetable module for angular 5
Stars: ✭ 32 (-97.28%)
Mutual labels:  tree-structure
vue-virtualised
Blazing fast scrolling and updating for any amount of list and hierarchical data.
Stars: ✭ 18 (-98.47%)
Mutual labels:  tree-structure
Finderjs
Browse hierarchical data in columns, similar to OS X's Finder
Stars: ✭ 247 (-79%)
Mutual labels:  tree-structure
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-98.98%)
Mutual labels:  text-processing
WeTextProcessing
Text Normalization & Inverse Text Normalization
Stars: ✭ 213 (-81.89%)
Mutual labels:  text-processing
Graphview
Flutter GraphView is used to display data in graph structures. It can display Tree layout, Directed and Layered graph. Useful for Family Tree, Hierarchy View.
Stars: ✭ 152 (-87.07%)
Mutual labels:  tree-structure
ConTexto
Librería en Python para minería de texto y NLP
Stars: ✭ 43 (-96.34%)
Mutual labels:  text-processing
youtube discussion tree
This is a python API that allows you to obtain the discusion that occurs on the comments of a Youtube video as a tree structure. It also controls the quota usage that consumes your implementation over Youtube Data Api through this library, and allows you to represent and serialize the discusion tree.
Stars: ✭ 16 (-98.64%)
Mutual labels:  tree-structure
r4strings
Handling Strings in R
Stars: ✭ 39 (-96.68%)
Mutual labels:  text-processing
twitter-text-python
Twitter Text Libraries for Python
Stars: ✭ 22 (-98.13%)
Mutual labels:  text-processing
rake-rs
Multilingual implementation of RAKE algorithm for Rust
Stars: ✭ 30 (-97.45%)
Mutual labels:  text-processing
ffiler
File Filer; sort files into structured directory tree. Tree can be structured based on various designs such as date (file modification time), file hash, file prefix etc
Stars: ✭ 45 (-96.17%)
Mutual labels:  tree-structure
Arrow Meta
Functional companion to Kotlin's Compiler
Stars: ✭ 246 (-79.08%)
Mutual labels:  tree-structure
sliceslice-rs
A fast implementation of single-pattern substring search using SIMD acceleration.
Stars: ✭ 66 (-94.39%)
Mutual labels:  text-processing
Relation Classification Using Bidirectional Lstm Tree
TensorFlow Implementation of the paper "End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures" and "Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths" for classifying relations
Stars: ✭ 167 (-85.8%)
Mutual labels:  tree-structure
mongodb-tree-structure
Implementing Tree Structure in MongoDB
Stars: ✭ 14 (-98.81%)
Mutual labels:  tree-structure
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-96.17%)
Mutual labels:  text-processing
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (-94.9%)
Mutual labels:  text-processing
ZKTreeTableView
A view of the tree structure.树状结构列表。
Stars: ✭ 97 (-91.75%)
Mutual labels:  tree-structure

Frangipanni

frangipanni

Program to convert lines of text into beautiful tree structures.

./frangipanni.jpg Plumeria sanalsp

The program reads each line on the standard input in turn. It breaks each line into tokens, then adds the sequence of tokens into a tree structure. Lines with the same leading tokens are placed in the same branch of the tree. The tree is printed as indented lines or JSON format. Alternatively the tree can be passed to a user-provided Lua script which can produce any output format.

Options control where the line is broken into tokens, and how it is analysed and output.

Basic Operation

Here is a simple example. Given this command sudo find /etc -maxdepth 3 | tail -9,

We get this data:

/etc/bluetooth/rfcomm.conf.dpkg-remove
/etc/bluetooth/serial.conf.dpkg-remove
/etc/bluetooth/input.conf
/etc/bluetooth/audio.conf.dpkg-remove
/etc/bluetooth/network.conf
/etc/bluetooth/main.conf
/etc/fish
/etc/fish/completions
/etc/fish/completions/task.fish

When we pipe this into the frangipanni program :

sudo find /etc -maxdepth 3 | tail -9 | frangipanni

we see this output:

etc
    bluetooth
        rfcomm.conf.dpkg-remove
        serial.conf.dpkg-remove
        input.conf
        audio.conf.dpkg-remove
        network.conf
        main.conf
    fish/completions/task.fish

By default, it reads each line and splits them into tokens when it finds a non-alphanumeric character.

In this next example we’re processing a list of files produced by find so we only want to break on directories. So we can specify -breaks /.

The default behaviour is to fold tree branches with no sub-branches into a single line of output. e.g. =fish/completions/task.fish= We turn off folding by specifying the -no-fold option. With the refined command

frangipanni -breaks / -no-fold

We see this output

etc
    bluetooth
        rfcomm.conf.dpkg-remove
        serial.conf.dpkg-remove
        input.conf
        audio.conf.dpkg-remove
        network.conf
        main.conf
    fish
        completions
            task.fish

Having restructured the data into a tree format we can output in other formats. We can ask for JSON by adding the -format json option. We get this output:

{"etc" : 
    {"bluetooth" : 
        ["rfcomm.conf.dpkg-remove",
        "serial.conf.dpkg-remove",
        "input.conf",
        "audio.conf.dpkg-remove",
        "network.conf",
        "main.conf"],
    "fish" : 
        {"completions" : "task.fish"}}}

Usage

The command is a simple filter taking standard input, and output on stdout.

cat <input> | frangipanni [options]

Options

-breaks string
      Characters to slice lines with.
-chars
      Slice line after every character.
-counts
      Print number of matches at the end of the line.
-depth int
      Maximum tree depth to print. (default 2147483647)
-format string
      Format of output: indent|json (default "indent")
-indent int
      Number of spaces to indent per level. (default 4)
-level int
      Analyse down to this level (positive integer). (default 2147483647)
-lua string
      Lua Script to run
-no-fold
      Don't fold into one line.
-order string
      Sort order input|alpha. Sort the childs either in input order or via character ordering (default "input")
-separators
      Print leading separators.
-skip int
      Number of leading fields to skip.
-spacer string
      Characters to indent lines with. (default " ")

Examples

Log files

Given input from a log file:

May 10 03:17:06 localhost systemd: Removed slice User Slice of root.
May 10 03:17:06 localhost systemd: Stopping User Slice of root.
May 10 04:00:00 localhost systemd: Starting Docker Cleanup...
May 10 04:00:00 localhost systemd: Started Docker Cleanup.
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.629849861+10:00" level=debug msg="Calling GET /_ping"
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.629948000+10:00" level=debug msg="Unable to determine container for /"
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.630103455+10:00" level=debug msg="{Action=_ping, LoginUID=12345678, PID=21075}"
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.630684502+10:00" level=debug msg="Calling GET /v1.26/containers/json?all=1&filters=%7B%22status%22%3A%7B%22dead%22%3Atrue%7D%7D"
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.630704513+10:00" level=debug msg="Unable to determine container for containers"
May 10 04:00:00 localhost dockerd-current: time="2020-05-10T04:00:00.630735545+10:00" level=debug msg="{Action=json, LoginUID=12345678, PID=21075}"

default output is:

May 10
 03:17:06 localhost systemd
  : Removed slice User Slice of root
  : Stopping User Slice of root
 04:00:00 localhost
   dockerd-current: time="2020-05-10T04:00:00
    .629849861+10:00" level=debug msg="Calling GET /_ping
    .629948000+10:00" level=debug msg="Unable to determine container for
    .630103455+10:00" level=debug msg="{Action=_ping, LoginUID=12345678, PID=21075
    .630684502+10:00" level=debug msg="Calling GET /v1.26/containers/json?all=1&filters=%7B%22status%22%3A%7B%22dead%22%3Atrue%7D%7D
    .630704513+10:00" level=debug msg="Unable to determine container for containers
    .630735545+10:00" level=debug msg="{Action=json, LoginUID=12345678, PID=21075
   systemd
    : Started Docker Cleanup
    : Starting Docker Cleanup

with the -skip 5 option we can ignore the date and time at the beginning of each line. The output is

localhost
    systemd
        Removed slice User Slice of root
        Stopping User Slice of root
        Starting Docker Cleanup
        Started Docker Cleanup
    dockerd-current: time="2020-05-10T04:00:00
        629849861+10:00" level=debug msg="Calling GET /_ping
        629948000+10:00" level=debug msg="Unable to determine container for
        630103455+10:00" level=debug msg="{Action=_ping, LoginUID=12345678, PID=21075
        630684502+10:00" level=debug msg="Calling GET /v1.26/containers/json?all=1&filters=%7B%22status%22%3A%7B%22dead%22%3Atrue%7D%7D
        630704513+10:00" level=debug msg="Unable to determine container for containers
        630735545+10:00" level=debug msg="{Action=json, LoginUID=12345678, PID=21075

Data from environment variables

Give this input, from ~env | egrep ‘^XDG’~ :

XDG_VTNR=2
XDG_SESSION_ID=5
XDG_SESSION_TYPE=x11
XDG_DATA_DIRS=/usr/share:/usr/share:/usr/local/share
XDG_SESSION_DESKTOP=plasma
XDG_CURRENT_DESKTOP=KDE
XDG_SEAT=seat0
XDG_RUNTIME_DIR=/run/user/1000
XDG_SESSION_COOKIE=fe37f2ef4-158904.727668-469753

And run with

$ env | egrep '^XDG' | ./frangipanni -breaks '=_' -no-fold -format json

we get

{"XDG" : 
    {"VTNR" : 2,
    "SESSION" : 
        {"ID" : 5,
        "TYPE" : "x11",
        "DESKTOP" : "plasma",
        "COOKIE" : "fe37f2ef4-158904.727668-469753"},
    "DATA" : 
        {"DIRS" : "/usr/share:/usr/share:/usr/local/share"},
    "CURRENT" : 
        {"DESKTOP" : "KDE"},
    "SEAT" : "seat0",
    "RUNTIME" : 
        {"DIR" : "/run/user/1000"}}}

Split the PATH

$ echo $PATH | tr ':' '\n' | ./frangipanni -separators
/home/alice
    /work/gopath/src/github.com/birchb1024/frangipanni
    /apps
        /textadept_10.8.x86_64
        /shellcheck-v0.7.1
        /Digital/Digital
        /gradle-4.9/bin
        /idea-IC-172.4343.14/bin
        /GoLand-173.3531.21/bin
        /arduino-1.6.7
    /yed
    /bin
/usr
    /lib/jvm/java-8-openjdk-amd64/bin
    /local
        /bin
        /games
        /go/bin
    /bin
    /games
/bin

Query a CSV triplestore -> JSON

A CSV tiplestore is a simple way of recording a database of facts about objects. Each line has a Subject, Object, Predicate structure.

john1@jupiter,rdf:type,UnixAccount
joanna,hasAccount,alice1@jupiter
jupiter,defaultAccount,alice1
alice2,hasAccount,evan1@jupiter
felicity,hasAccount,john1@jupiter
alice1@jupiter,rdf:type,UnixAccount
kalpana,hasAccount,alice1@jupiter
john1@jupiter,hasPassword,felicity-pw-8
Production,was_hostname,jupiter
alice1@jupiter,rdf:type,UnixAccount
alice1@jupiter,hasPassword,alice-pw-2

In this example we want the data about the jupiter machine. We permute the input records with awk and filter the JSON output with jq.

$ cat test/fixtures/triples.csv | \
  awk -F, '{print $2,$1,$3; print $1, $2, $3; print $3, $2, $1}' | \
  ./frangipanni  -breaks ' ' -order alpha -format json -no-fold | \
  jq '."jupiter"'
{
  "defaultAccount": "alice1",
  "hasUser": [
    "alice1",
    "birchb1",
    "john1"
  ],
  "rdf:type": [
    "UnixMachine",
    "WasDmgr"
  ],
  "was_hostname": "Production"
}

Security Analysis of sudo use in Auth Log File

The Linux /var/log/auth.log file has timed records about sudo which look like this:

May 17 00:36:15 localhost sudo:   alice : TTY=pts/2 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/jmtpfs -o allow_other /tmp/s
May 17 00:36:15 localhost sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
May 17 00:36:15 localhost sudo: pam_unix(sudo:session): session closed for user root

By skipping the date/time component of the lines, and specifying -counts we can see a breakdown of the sudo commands used and how many occurred. By placing the date/time data at the end of the input lines we alse get a breakdown of the commands by hour of day.

$ sudo cat /var/log/auth.log | grep sudo | \
    awk '{print substr($0,16),substr($0,1,15)}' | \
    ./frangipanni -breaks ' ;:'  -depth 5 -counts -separators

Produces

localhost sudo: 125
   :   alice: 42
        : TTY=pts/2: 14
            ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/jmtpfs: 5
            ; PWD=/home/alice/workspace/gopath/src/github.com/akice/frangipanni ; USER=root ; COMMAND=/usr/bin/find /etc -maxdepth 3 May 17 13: 9
        : TTY=pts/1 ; PWD=/home/alice/workspace/gopath/src/github.com/akice/frangipanni ; USER=root ; COMMAND=/bin/cat: 28
            /var/log/messages May 17 13:53:34: 1
            /var/log/auth.log May 17: 27
   : pam_unix(sudo:session): session: 83
        opened for user root by (uid=0) May 17: 42
            00: 5
            13: 28
            14: 9
        closed for user root May 17: 41
            00: 5
            13: 28
            14: 8

We can see alice has run 42 sudo commands, 28 of whuch were =cat=ing files from /var.

Output for Spreadsheets

Inevitably you will need to output reports from frangipanni into a spreadsheet. You can use the -spacer option to specify the character(s) to use for indentation and before the counts. So with the file list example from above and this command

sudo find /etc -maxdepth 3 | tail -9 | frangipanni -no-fold -counts -indent 1 -spacer $'\t'

You will have a tab-separated output which can be imported to your spreadsheet.

etc9
bluetooth6
rfcomm.conf.dpkg-remove1
serial.conf.dpkg-remove1
input.conf1
audio.conf.dpkg-remove1
network.conf1
main.conf1
fish/completions/task.fish3

Output for Markdown

To use the output with markdown or other text-based tools, sepecify the -separator option. This can be used by tools like sed to convert the leading separator into the markup required. example to get a leading minus sign for an un-numbered Markdown list, use sed to

sudo find /etc -maxdepth 3 | tail -9 | frangipanni -separators | sed 's;/; - ;'

Which results in an indented bullet list:

  • etc
    • bluetooth
      • rfcomm.conf.dpkg-remove
      • serial.conf.dpkg-remove
      • input.conf
      • audio.conf.dpkg-remove
      • network.conf
      • main.conf
    • fish/completions/task.fish

Lua Examples

JSON (again)

First, we are going tell frangipanni to output via a Lua program called ‘json.lua’, and we will format the json with the ‘jp’ program.

$ <test/fixtures/simplechars.txt frangipanni -lua json.lua | jp @

The Lua script uses the github.com/layeh/gopher-json module which is imported in the Lua. The data is made available in the variable frangipanni which has a table for each node, with fields

  • depth - in the tree starting from 0
  • lineNumber - the token was first detected
  • numMatched - the number of times the token was seen
  • sep - separation characters preceding the token
  • text - the token itself
  • children - a table containing the child nodes
local json = require("json")

print(json.encode(frangipanni))

The output shows that all the fields of the parsed nodes are passed to Lua in a Table. The root node is empty except for it’s children. The Lua script is therefore able to use the fields intelligently.

{
  "depth": 0,
  "lineNumber": -1,
  "numMatched": 1,
  "sep": "",
  "text": "",
  "children": {
    "1.2": {
      "children": [],
      "depth": 1,
      "lineNumber": 8,
      "numMatched": 1,
      "sep": "",
      "text": "1.2"
    },
    "A": {
      "children": [],
      "depth": 1,
      "lineNumber": 1,
      "numMatched": 1,
      "sep": "",
      "text": "A"
    },

Markdown

function indent(n)
    for i=1, n do
        io.write("   ")
    end
end

function markdown(node)
    indent(node.depth)
    io.write("* ")
    print(node.text)
    for k, v in pairs(node.children) do
        markdown(v)
    end
end

markdown(frangipanni)

The output can look like this:

* 
   * A
   * C
      * 2
      * D
   * x.a
      * 2
      * 1
   * Z
   * 1.2

XML

The xml.lua script provided in the release outputs very basic XML format which might suit simple inputs.

<root count="1" sep="">
   <C count="2" sep="">
      <2 count="1" sep="."/>
      <D count="1" sep="."/>
   </C>
   <x.a count="3" sep="">
      <1 count="1" sep="."/>
      <2 count="1" sep="."/>
   </x.a>
   <Z count="1" sep=""/>
   <1.2 count="1" sep=""/>
   <A count="1" sep=""/>
</root>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].