All Projects → nschneid → Unix Text Commands

nschneid / Unix Text Commands

Unix Text Processing Command Reference

Projects that are alternatives of or similar to Unix Text Commands

Oh
A new Unix shell.
Stars: ✭ 1,206 (+1446.15%)
Mutual labels:  command-line, unix
Survey
A golang library for building interactive and accessible prompts with full support for windows and posix terminals.
Stars: ✭ 2,843 (+3544.87%)
Mutual labels:  command-line, unix
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+89.74%)
Mutual labels:  command-line, text-processing
Cli Boot.camp
💻 command-line bootcamp adventure in your browser
Stars: ✭ 88 (+12.82%)
Mutual labels:  command-line, unix
Bash Boilerplate
A collection of Bash scripts for creating safe and useful command line programs.
Stars: ✭ 447 (+473.08%)
Mutual labels:  command-line, unix
Simple
The Simple Intelligent and Modular Programming Language and Environment
Stars: ✭ 120 (+53.85%)
Mutual labels:  command-line, unix
Snapstub
Copy API endpoints to your fs and run a local server using them
Stars: ✭ 193 (+147.44%)
Mutual labels:  command-line, unix
Command Line Text Processing
⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Stars: ✭ 9,771 (+12426.92%)
Mutual labels:  command-line, text-processing
Jtc
JSON processing utility
Stars: ✭ 425 (+444.87%)
Mutual labels:  command-line, unix
Bfs
A breadth-first version of the UNIX find command
Stars: ✭ 336 (+330.77%)
Mutual labels:  command-line, unix
Sd
Intuitive find & replace CLI (sed alternative)
Stars: ✭ 2,755 (+3432.05%)
Mutual labels:  command-line, text-processing
Ed
A modern UNIX ed (line editor) clone written in Go
Stars: ✭ 44 (-43.59%)
Mutual labels:  command-line, unix
command-line-quick-reference
quick reference on command line tools and techniques for the people with limited time
Stars: ✭ 331 (+324.36%)
Mutual labels:  unix, reference
Ipt
Interactive Pipe To: The Node.js cli interactive workflow
Stars: ✭ 783 (+903.85%)
Mutual labels:  command-line, unix
Jsonf
A Unix-y utility for formatting JSON in a more stream-friendly way
Stars: ✭ 65 (-16.67%)
Mutual labels:  command-line, unix
Yes
yes - Implementation of simple and dangerous yes command in various languages. 👍
Stars: ✭ 72 (-7.69%)
Mutual labels:  command-line
Glob
Glob for C++17
Stars: ✭ 74 (-5.13%)
Mutual labels:  unix
Sandmap
Nmap on steroids. Simple CLI with the ability to run pure Nmap engine, 31 modules with 459 scan profiles.
Stars: ✭ 1,180 (+1412.82%)
Mutual labels:  command-line
Taskell
Command-line Kanban board/task manager with support for Trello boards and GitHub projects
Stars: ✭ 1,175 (+1406.41%)
Mutual labels:  command-line
Navi
An interactive cheatsheet tool for the command-line
Stars: ✭ 10,055 (+12791.03%)
Mutual labels:  command-line

Unix Text Processing Command Reference

Nathan Schneider, 2013-01-29

This is intended as a quick reference for text processing commands built into Unix. It is terse and not necessarily comprehensive—YMMV.

Suggestions? Contact the author or submit a pull request.

Notes about the commands below:

  • None of these commands actually modify the input files; rather, they manipulate input text and produce output text, typically writing to standard output.

  • ALLCAPS indicates a metavariable.

  • The descriptions are selective. For more comprehensive documentation of options, see the command’s man page. For tutorials and examples, search the Web.

Some tutorials and references:

More powerful tools for advanced text processing operations:

No input stream

yes

Repeats a line (by default, y) infinitely.

  • yes LINE | head -n 10 repeats LINE 10 times.

One or more input files/streams

cat

Concatenate the input files together in sequence.

  • -s: suppress/squeeze multiple consecutive blank lines
  • -n: number all lines (cf. nl)
  • -b: number non-blank lines

zcat

Like cat, but for gzipped files.

tac

Like cat, in reverse: lines are printed in reverse order. (GNU but not BSD.)

  • To print the last line of (contiguous) groups sharing all but the first 2 fields in common: tac FILE | uniq -f 2 | tac

wc

Counts lines/words/characters in the specified file(s), individually and in total. By default, displays lines, then words, then characters.

  • -l: count lines
  • -w: count words
  • -c: count characters

Typically a single input stream

Encoding

file

Determines the encoding of a text file, or indicates that the argument is a directory or pipe.

iconv

Converts the encoding of a text file.

  • iconv -f ISO-8859-1 -t UTF-8 FILE converts from ISO-8859-1 to UTF-8

Filtering/extracting by position

cut

Extracts fields from a file, based on delimiters or character positions.

  • cut -f1 FILE retrieves the first (tab-separated) column from the file
  • cut -d' ' -f1,3 FILE retrieves the first and third space-separated tokens from each line
  • cut -d'
    ' -f20-30 FILE (with a line break) supposedly retrieves the 20th-30th lines of the file, though this doesn’t seem to work in OS X. Equivalently: head -n 30 | tail -n 20
  • -s to omit lines without any delimiter

head, tail

Extracts a certain amount of text from the beginning or end of a file.

If multiple files are matched by the argument(s), a header indicating the filename will be displayed.

  • -n N: number of lines to retrieve (default: 10)
  • -c N: number of characters (bytes) to retrieve
  • tail -n +N, tail -c +N: N indicates an offset relative to the beginning of the file; the rest of the file after that offset will be extracted
  • head -n -N, head -c -N: offset relative to the end of the file (GNU but not BSD implementation)
  • head -n 100 FILE | tail -n 1 retrieves the 100th line of the file
  • tail -f FILE monitors the end of the file, writing to stdout as the file is appended to

Filtering/extracting by content

uniq

Filters out duplicate lines of input.

  • -c: prefix each line with a count
  • -i: case-insensitive
  • -f N: ignore the first N (whitespace-separated) fields of each line
  • -s N: ignore the first N characters of each line
  • -w N: ignore all but the first N characters of each line
  • Note: If some parts of the line are ignored, the kept and discarded lines may differ. The first line with a given “key” will be the one that is kept.
  • other options for filtering repeated or non-repeated lines

grep + friends

Searches text by regular expression.

  • -i: case-insensitive
  • -o: only show the matched part of the line (if multiple matches on an input line, these will be on separate output lines)
  • -w: match only whole words
  • -l: list only files in which matches were found
  • -r: recursive
  • -n: include matching lines and line numbers
  • -v (--invert-match): filter out matches
  • -c (--count): give counts of the matches within each file instead of the matches themselves
  • -E or egrep: extended regex syntax: unescaped +, (, and ) serve as operators
  • -F or fgrep: literal string matching (no regexes)
  • -H: suppress filename when displaying matches
  • zgrep searches zip files
  • bzgrep searches bz files

Augmenting

nl

Adds line numbers to a file. (Cf. cat -n.) Options control formatting and counting of the line numbers, including:

  • -v STARTNUM: initial counter value (default: 1)
  • -i INCREMENT (default: 1)
  • -w WIDTH: number of characters to be occupied by line numbers (default: 6)
  • -s SEP: separator to follow every line number (default: tab)
  • -b t: number non-empty lines (default); -b a: number all lines; -b pREGEX: number lines matching the regular expression pattern REGEX

Reordering

sort

Sorts the input lines.

  • PUNCTUATION/SPECIAL CHARS MAY BE IGNORED depending on the value of the LC_COLLATE environment variable
  • -f: ignores case
  • -n: “string numerical value”
  • -g: general numeric sort
  • -i: ignores nonprinting characters
  • -r: reverse
  • -u: unique
  • -k: sort key (field offsets)
  • -t: field delimiter
  • To sort by the first column, keeping only the last record for each: tac FILE | sort -k1,1 -u

shuf

Randomly permutes the input lines. (GNU but not BSD)

rev

Reverses each line of the file.

Rearranging (changing spacing)

expand, unexpand

Convert tabs to spaces and vice versa.

fold

Wrap input text so no line is more than a specified number of characters wide.

  • -w WIDTH: wrap lines in a file to be no more than WIDTH characters long (default: 80)
  • -s: breaks lines at word spaces

column

Formats text into columns (by default, based on whitespace delimiters).

Dividing

split

split FILE OUTPREFIX breaks up the input into smaller files by size.

Output files are named by the specified prefix and some number of lowercase alphabetic characters (configurable with -a; defaults to 2, i.e. aa, ab, etc.). In some implementations -d can be provided to request decimal rather than alphabetic suffixes. One of the following may be provided to determine the splitting behavior:

  • -l NUMLINES: number of lines in each output file (default: 1000)
  • -b BYTESIZE: size of each output file; BYTESIZE can even be in kilobytes (10k) or megabytes (10m)

csplit

Breaks up the input into smaller files by content.

Main arguments are the file, followed by one or more patterns indicating split points. Each pattern may be a line number, a regexp (optionally with a line offset), or a number of lines followed by {REPEATS} to indicate REPEATS blocks of the specified number of lines. The line matching the pattern begins a new output file. Output files are numbered with decimal digits.

  • -f OUTPREFIX (default: xx)
  • -n NUMDIGITS (default: 2)

Replacing

tr

Translates characters, e.g. lowercasing text in a file or replacing newlines with spaces.

sed

Text substitution by regular expression matching.

  • sed 's/K\.? ?V?\.? ?/K/g' FILE replaces all matches of the pattern in the file
  • sed '/^$/d' FILE filters out blank lines of the file
  • Depending on the implementation, sed may or may not not support backslash-denoted characters and character classes such as \t, \s, and [[:space:]]. (\t and \s are supported in GNU but not BSD implementations.) Tabs and newlines can be entered as literals.
  • Unless -E is specified, the plus operator and parenthesized subexpressions MUST HAVE BACKSLASH ESCAPES when using sed/grep: e.g. \(.*\).\+

Two input streams

Call them FILE1 and FILE2, respectively.

diff

Compare two files line-by-line. (Cf. diff3 for three files.) Can also compare directories.

  • -i: ignore case
  • -w: ignore whitespace
  • -B: ignore blank lines
  • -I REGEX: ignore differences among lines that all match the regular expression pattern REGEX
  • -x REGEX: exclude files that match the pattern
  • -r: recursively compare subdirectories
  • -s: report identical files (file comparison only)
  • -u: unified diff display format
  • -y: side-by-side display (cf. sdiff)
  • --suppress-common-lines

Other options are useful when comparing code, e.g. -p, -F, -D, and -E.

sdiff

Compare files side-by-side, as with diff -y. Many of the diff options are available; -s is short for --suppress-common-lines.

comm

Displays lines common or unique to two SORTED files, organized into columns according to their commonality.

  • -1 to suppress the first column (lines only in FILE1)
  • -2 to suppress the second column (lines only in FILE2)
  • -3 to suppress the third column (lines common to both files)
  • -i for case-insensitive comparison
  • if the files are sorted, join -t'
    ' FILE1 FILE2 is much faster than comm -1 -2 FILE1 FILE2

join

Merges the lines of two sorted text files based on the presence of a common field.

  • -t CHAR to indicate a delimiter (by default, it is any sequence of whitespace). For instance, join -t'
    ' FILE1 FILE2 lists all common lines, assuming the two files are sorted.

Typically two or more input streams

paste

Combines/merges lines of multiple files.

  • paste FILE1 FILE2 … joins corresponding lines with tabs (vertically)
  • paste -s FILE1 FILE2 … joins horizontally, with corresponding lines of the input files displayed as columns
  • paste -d'\n' FILE1 FILE2 … interleaves lines of the files

Three input streams

diff3

Like diff, but for three files.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].