Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → nschneid → Unix Text Commands

nschneid / Unix Text Commands

Unix Text Processing Command Reference

Labels

nlp command-line unix reference text-processing

Projects that are alternatives of or similar to Unix Text Commands

A new Unix shell.

Stars: ✭ 1,206 (+1446.15%)

Mutual labels: command-line, unix

Survey

A golang library for building interactive and accessible prompts with full support for windows and posix terminals.

Stars: ✭ 2,843 (+3544.87%)

Mutual labels: command-line, unix

Xioc

Extract indicators of compromise from text, including "escaped" ones.

Stars: ✭ 148 (+89.74%)

Mutual labels: command-line, text-processing

Cli Boot.camp

💻 command-line bootcamp adventure in your browser

Stars: ✭ 88 (+12.82%)

Mutual labels: command-line, unix

Bash Boilerplate

A collection of Bash scripts for creating safe and useful command line programs.

Stars: ✭ 447 (+473.08%)

Mutual labels: command-line, unix

Simple

The Simple Intelligent and Modular Programming Language and Environment

Stars: ✭ 120 (+53.85%)

Mutual labels: command-line, unix

Snapstub

Copy API endpoints to your fs and run a local server using them

Stars: ✭ 193 (+147.44%)

Mutual labels: command-line, unix

Command Line Text Processing

⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨

Stars: ✭ 9,771 (+12426.92%)

Mutual labels: command-line, text-processing

Jtc

JSON processing utility

Stars: ✭ 425 (+444.87%)

Mutual labels: command-line, unix

Bfs

A breadth-first version of the UNIX find command

Stars: ✭ 336 (+330.77%)

Mutual labels: command-line, unix

Intuitive find & replace CLI (sed alternative)

Stars: ✭ 2,755 (+3432.05%)

Mutual labels: command-line, text-processing

A modern UNIX ed (line editor) clone written in Go

Stars: ✭ 44 (-43.59%)

Mutual labels: command-line, unix

command-line-quick-reference

quick reference on command line tools and techniques for the people with limited time

Stars: ✭ 331 (+324.36%)

Mutual labels: unix, reference

Ipt

Interactive Pipe To: The Node.js cli interactive workflow

Stars: ✭ 783 (+903.85%)

Mutual labels: command-line, unix

Jsonf

A Unix-y utility for formatting JSON in a more stream-friendly way

Stars: ✭ 65 (-16.67%)

Mutual labels: command-line, unix

Yes

yes - Implementation of simple and dangerous yes command in various languages. 👍

Stars: ✭ 72 (-7.69%)

Mutual labels: command-line

Glob

Glob for C++17

Stars: ✭ 74 (-5.13%)

Mutual labels: unix

Sandmap

Nmap on steroids. Simple CLI with the ability to run pure Nmap engine, 31 modules with 459 scan profiles.

Stars: ✭ 1,180 (+1412.82%)

Mutual labels: command-line

Taskell

Command-line Kanban board/task manager with support for Trello boards and GitHub projects

Stars: ✭ 1,175 (+1406.41%)

Mutual labels: command-line

Navi

An interactive cheatsheet tool for the command-line

Stars: ✭ 10,055 (+12791.03%)

Mutual labels: command-line

View All Similar Projects ➔

Unix Text Processing Command Reference

Nathan Schneider, 2013-01-29

This is intended as a quick reference for text processing commands built into Unix. It is terse and not necessarily comprehensive—YMMV.

Suggestions? Contact the author or submit a pull request.

Notes about the commands below:

None of these commands actually modify the input files; rather, they manipulate input text and produce output text, typically writing to standard output.
ALLCAPS indicates a metavariable.
The descriptions are selective. For more comprehensive documentation of options, see the command’s man page. For tutorials and examples, search the Web.

Some tutorials and references:

Ken Church’s Unix™ for Poets
Jim Notwell’s Introduction to Text-Processing
Na-Rae Han’s Command-line Magic
Advanced Bash-Scripting Guide: Text Processing Commands
GNU Coreutils
- for Mac OS X: coreutils, sed (BSD implementations are built-in on OS X)

More powerful tools for advanced text processing operations:

AWK
pyp, grep/sed/AWK for the Python-inclined
- Python tips and tricks

No input stream

`yes`

Repeats a line (by default, y) infinitely.

yes LINE | head -n 10 repeats LINE 10 times.

One or more input files/streams

`cat`

Concatenate the input files together in sequence.

-s: suppress/squeeze multiple consecutive blank lines
-n: number all lines (cf. nl)
-b: number non-blank lines

`tac`

Like cat, in reverse: lines are printed in reverse order. (GNU but not BSD.)

To print the last line of (contiguous) groups sharing all but the first 2 fields in common: tac FILE | uniq -f 2 | tac

`wc`

Counts lines/words/characters in the specified file(s), individually and in total. By default, displays lines, then words, then characters.

-l: count lines
-w: count words
-c: count characters

Typically a single input stream

Encoding

`file`

Determines the encoding of a text file, or indicates that the argument is a directory or pipe.

`iconv`

Converts the encoding of a text file.

iconv -f ISO-8859-1 -t UTF-8 FILE converts from ISO-8859-1 to UTF-8

Filtering/extracting by position

`cut`

Extracts fields from a file, based on delimiters or character positions.

cut -f1 FILE retrieves the first (tab-separated) column from the file
cut -d' ' -f1,3 FILE retrieves the first and third space-separated tokens from each line
cut -d'
' -f20-30 FILE (with a line break) supposedly retrieves the 20th-30th lines of the file, though this doesn’t seem to work in OS X. Equivalently: head -n 30 | tail -n 20
-s to omit lines without any delimiter

`head`, `tail`

Extracts a certain amount of text from the beginning or end of a file.

If multiple files are matched by the argument(s), a header indicating the filename will be displayed.

-n N: number of lines to retrieve (default: 10)
-c N: number of characters (bytes) to retrieve
tail -n +N, tail -c +N: N indicates an offset relative to the beginning of the file; the rest of the file after that offset will be extracted
head -n -N, head -c -N: offset relative to the end of the file (GNU but not BSD implementation)
head -n 100 FILE | tail -n 1 retrieves the 100th line of the file
tail -f FILE monitors the end of the file, writing to stdout as the file is appended to

Filtering/extracting by content

`uniq`

Filters out duplicate lines of input.

-c: prefix each line with a count
-i: case-insensitive
-f N: ignore the first N (whitespace-separated) fields of each line
-s N: ignore the first N characters of each line
-w N: ignore all but the first N characters of each line
Note: If some parts of the line are ignored, the kept and discarded lines may differ. The first line with a given “key” will be the one that is kept.
other options for filtering repeated or non-repeated lines

`grep` + friends

Searches text by regular expression.

-i: case-insensitive
-o: only show the matched part of the line (if multiple matches on an input line, these will be on separate output lines)
-w: match only whole words
-l: list only files in which matches were found
-r: recursive
-n: include matching lines and line numbers
-v (--invert-match): filter out matches
-c (--count): give counts of the matches within each file instead of the matches themselves
-E or egrep: extended regex syntax: unescaped +, (, and ) serve as operators
-F or fgrep: literal string matching (no regexes)
-H: suppress filename when displaying matches
zgrep searches zip files
bzgrep searches bz files

Augmenting

`nl`

Adds line numbers to a file. (Cf. cat -n.) Options control formatting and counting of the line numbers, including:

-v STARTNUM: initial counter value (default: 1)
-i INCREMENT (default: 1)
-w WIDTH: number of characters to be occupied by line numbers (default: 6)
-s SEP: separator to follow every line number (default: tab)
-b t: number non-empty lines (default); -b a: number all lines; -b pREGEX: number lines matching the regular expression pattern REGEX

Reordering

`sort`

Sorts the input lines.

PUNCTUATION/SPECIAL CHARS MAY BE IGNORED depending on the value of the LC_COLLATE environment variable
-f: ignores case
-n: “string numerical value”
-g: general numeric sort
-i: ignores nonprinting characters
-r: reverse
-u: unique
-k: sort key (field offsets)
-t: field delimiter

To sort by the first column, keeping only the last record for each: tac FILE | sort -k1,1 -u

`shuf`

Randomly permutes the input lines. (GNU but not BSD)

`rev`

Reverses each line of the file.

Rearranging (changing spacing)

`expand`, `unexpand`

Convert tabs to spaces and vice versa.

`fold`

Wrap input text so no line is more than a specified number of characters wide.

-w WIDTH: wrap lines in a file to be no more than WIDTH characters long (default: 80)
-s: breaks lines at word spaces

`column`

Formats text into columns (by default, based on whitespace delimiters).

Dividing

`split`

split FILE OUTPREFIX breaks up the input into smaller files by size.

Output files are named by the specified prefix and some number of lowercase alphabetic characters (configurable with -a; defaults to 2, i.e. aa, ab, etc.). In some implementations -d can be provided to request decimal rather than alphabetic suffixes. One of the following may be provided to determine the splitting behavior:

-l NUMLINES: number of lines in each output file (default: 1000)
-b BYTESIZE: size of each output file; BYTESIZE can even be in kilobytes (10k) or megabytes (10m)

`csplit`

Breaks up the input into smaller files by content.

Main arguments are the file, followed by one or more patterns indicating split points. Each pattern may be a line number, a regexp (optionally with a line offset), or a number of lines followed by {REPEATS} to indicate REPEATS blocks of the specified number of lines. The line matching the pattern begins a new output file. Output files are numbered with decimal digits.

-f OUTPREFIX (default: xx)
-n NUMDIGITS (default: 2)

Replacing

`tr`

Translates characters, e.g. lowercasing text in a file or replacing newlines with spaces.

`sed`

Text substitution by regular expression matching.

sed 's/K\.? ?V?\.? ?/K/g' FILE replaces all matches of the pattern in the file
sed '/^$/d' FILE filters out blank lines of the file
Depending on the implementation, sed may or may not not support backslash-denoted characters and character classes such as \t, \s, and [[:space:]]. (\t and \s are supported in GNU but not BSD implementations.) Tabs and newlines can be entered as literals.
Unless -E is specified, the plus operator and parenthesized subexpressions MUST HAVE BACKSLASH ESCAPES when using sed/grep: e.g. $.*$.\+

Two input streams

Call them FILE1 and FILE2, respectively.

`diff`

Compare two files line-by-line. (Cf. diff3 for three files.) Can also compare directories.

-i: ignore case
-w: ignore whitespace
-B: ignore blank lines
-I REGEX: ignore differences among lines that all match the regular expression pattern REGEX
-x REGEX: exclude files that match the pattern
-r: recursively compare subdirectories
-s: report identical files (file comparison only)
-u: unified diff display format
-y: side-by-side display (cf. sdiff)
--suppress-common-lines

Other options are useful when comparing code, e.g. -p, -F, -D, and -E.

`sdiff`

Compare files side-by-side, as with diff -y. Many of the diff options are available; -s is short for --suppress-common-lines.

`comm`

Displays lines common or unique to two SORTED files, organized into columns according to their commonality.

-1 to suppress the first column (lines only in FILE1)
-2 to suppress the second column (lines only in FILE2)
-3 to suppress the third column (lines common to both files)
-i for case-insensitive comparison
if the files are sorted, join -t'
' FILE1 FILE2 is much faster than comm -1 -2 FILE1 FILE2

`join`

Merges the lines of two sorted text files based on the presence of a common field.

-t CHAR to indicate a delimiter (by default, it is any sequence of whitespace). For instance, join -t'
' FILE1 FILE2 lists all common lines, assuming the two files are sorted.

Typically two or more input streams

`paste`

Combines/merges lines of multiple files.

paste FILE1 FILE2 … joins corresponding lines with tabs (vertically)
paste -s FILE1 FILE2 … joins horizontally, with corresponding lines of the input files displayed as columns
paste -d'\n' FILE1 FILE2 … interleaves lines of the files

Three input streams

`diff3`

Like diff, but for three files.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 78

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nschneid / Unix Text Commands

Labels

Projects that are alternatives of or similar to Unix Text Commands

Unix Text Processing Command Reference

Nathan Schneider, 2013-01-29

No input stream

One or more input files/streams

Typically a single input stream

Encoding

Filtering/extracting by position

head, tail

Filtering/extracting by content

grep + friends

Augmenting

Reordering

rev

Rearranging (changing spacing)

expand, unexpand

column

Dividing

Replacing

Two input streams

sdiff

Typically two or more input streams

Three input streams

diff3

`head`, `tail`

`grep` + friends

`rev`

`expand`, `unexpand`

`column`

`sdiff`

`diff3`