All Projects → wofr06 → Lesspipe

wofr06 / Lesspipe

Licence: gpl-2.0
lesspipe (formerly on sourceforge)

Programming Languages

perl
6916 projects

lesspipe.sh, a preprocessor for less

Version: 1.85 Author : Wolfgang Friebel, DESY (Wolfgang.Friebel AT desy.de) License: GPL

Latest version available from: https://github.com/wofr06/lesspipe.sh/archive/lesspipe.zip https://github.com/wofr06/lesspipe (git repository)

The development version can be cloned using git: git clone https://github.com/wofr06/lesspipe.git To report bugs or make proposals to improve lesspipe please contact the author by email.

Contents

  1. Motivation

  2. Introduction

  3. Usage

  4. Required programs

  5. Supported file formats 4.1 Supported compression methods 4.2 List of preprocessed file types 4.3 Conversion of files with alternate character encoding

  6. Syntax highlighting 5.1.1 List of supported languages 5.1.2 Syntax highlighting alternatives 5.2 Colored directory listing 5.3 Colored listing of tar file contents

  7. Displaying files with special characters in the file name

  8. Examples

  9. Other documentation about lesspipe

  10. External links 9.1 URLs to some utilities 9.2 References

  11. Contributors

  12. Motivation ===============

If you use

  • the pager less in the command line,
  • the version control system git,
  • the text editor Vim or
  • the mail client mutt,

then lesspipe.sh enables these programs to read non-text files, such as:

  • PDFs,
  • (Microsoft or LibreOffice) Office documents, or even
  • media (such as [JPG or PNG] images, [MP3] audio or video) files

where read means,

  • (format and) show the contained text (of a document or tag in a media file), or
  • show sensible file information (such as length of the video).

To enable less respectively git, Vim or mutt to read non-text files by lesspipe.sh, see

For the text and info extraction, lesspipe.sh will depend on external tools, but many use cases are covered by an installation of

  • LibreOffice and a common text browser (such as lynx),
  • pdftotext, and
  • mediainfo (or exiftool).
  1. Introduction ===============

To browse files under UNIX the excellent viewer less [1] can be used. By setting the environment variable LESSOPEN, less can be enhanced by external filters to become even more powerful. Most Linux distributions come already with a "lesspipe.sh" that covers the most common situations.

The input filter for less described here is called "lesspipe.sh". It is able to process a wide variety of file formats. It enables users to deeply inspect archives and to display the contents of files in archives without having to unpack them before. That means file contents can be properly interpreted even if the files are compressed and contained in a hierarchy of archives (often found in RPM or DEB archives containing source tarballs). The filter is easily extensible for new formats.

The input filter which is also called "lesspipe.sh" is written in a ksh compatible language (ksh, bash, zsh) as one of these is nearly always installed on UNIX systems and uses comparably few resources. Otherwise an implementation in perl for example would have been somewhat simpler to code. The code looks less clean than it could as it was tried to make the script compatible with a number of old shells and applications especially found on non Linux systems.

The filter does different things depending on the file format. In most cases it is determined on the output of the "file" command [2], [6], that recognizes lots of formats. Only in a few cases the file suffix is used to determine what to display. Up to date file descriptions are included in the "file" package. Maintaining a list of file formats is therefore only a matter of keeping that package up to date.

  1. Usage ========

(see also the man page lesspipe.1)

To activate lesspipe.sh the environment variable LESSOPEN has to be defined in the following way:

LESSOPEN="|lesspipe.sh %s"; export LESSOPEN (sh like shells) setenv LESSOPEN "|lesspipe.sh %s" (csh, tcsh)

If lesspipe.sh is not in the UNIX search path or if the wrong lesspipe.sh is found in the search path, then the full path to lesspipe.sh should be given in the above commands.

As lesspipe.sh is accepting only a single argument, a hierarchical list of file names has to be separated by a non blank character. A colon is rarely found in file names, therefore it has been chosen as the separator character. If a file name does however contain at least one isolated colon, the equal sign = can be used as an alternate separator character. In that case the = character has to reoccur as the last character of the argument. At each stage in extracting files from such a hierarchy the file type is determined. This guarantees a correct processing and display at each stage of the filtering.

To view files in multifile archives the following command can be used: less archive_file:contained_file or less archive_file=contained_file= (with = as separator) This can be used to extract single files from a multifile archive: less archive_file:contained_file > extracted_file For extracting files less is not required, that can be done also using: lesspipe.sh archive_file:contained_file > extracted_file Even a file in a multifile archive that itself is contained in yet another archive can be viewed this way: less super_archive:archive_file:contained_file

The script is able to extract files up to a depth of 6 where applying a decompression algorithm counts as a separate level. In a few rare cases the file command does not recognize the correct format (especially with nroff). In such cases the filtering can be suppressed by a trailing colon on the file name.

Display the last file in the file1:..:fileN chain in raw format: Suppress input filtering: less file1:..:fileN: (append 1 colon) Suppress decompression: less file1:..:fileN:: (append 2 colons)

Suppress syntax highlighting: less file1:..:fileN: (append 1 colon) Syntax highlighting (see below) is only tried if less is called with -r or -R and highlighting support was requested when generating lesspipe.sh !!!

Several environment variables can influence the behavior of lesspipe.sh.

LESSQUIET will suppress additional output not belonging to the file contents if set to a non empty value.

LESS can be used to switch on colored less output (has to contain -r or -R).

LESSCOLORIZER can contain a syntax highlighting program different from the code2color used by default (currently only pygmentize allowed).

Code for using LESS_ADVANCED_PREPROCESSOR is optionally generated (configure). LESS_ADVANCED_PREPROCESSOR will switch on filtering methods for html, rtf, ps files and files with alternate character encoding, if this variable is set. Filtering these formats is also done if there is no LESS_ADVANCED_PREPROCESSOR support (then this string is not contained in lesspipe.sh). Otherwise these types of files will be shown unmodified.

  1. Required programs ====================

bash or zsh or ksh (also pdksh, tested with version 5.2). Configure puts an appropriate first line in the script file (a version with an up to date magic file) (GNU file 4.xx recommended) perl (for configure, code2color and tarcolor, lesspipe.sh works without it) Standard UNIX programs like ar, cat, cut, dd, egrep, gzip, ln, ls, mkdir, rm, sed, strings, tar, tput and further programs for special formats.

  1. Supported file formats =========================

Currently lesspipe.sh [3] supports the following compression methods and file types (i.e. the file contents gets transformed by lesspipe.sh):

4.1 Supported compression methods

gzip, compress, pack requires gzip bzip2 requires bzip2 zip requires unzip rar requires rar or unrar 7-zip requires 7za lzip requires lzip lzma requires lzma (limited support only) xz requires xz zstd requires zstd brotli requires bro lz4 requires lz4

4.2 List of preprocessed file types

tar requires GNU tar and optionally tarcolor for coloring nroff(mandoc) requires groff ar library requires ar jar archive requires fastjar or unzip rar archive requires unrar or rar 7-zip archive requires 7za lzip archive requires lzip shared library requires nm executable requires strings directory displayed using ls -lA RPM requires GNU cpio and rpm2cpio or rpmunpack, optionally rpm Microsoft Word < 2007 requires antiword or catdoc or libreoffice MS Powerpoint < 2007 requires ppthtml or libreoffice MS Excel < 2007 requires xlhtml or libreoffice Microsoft Word (docx) >= 2007 requires pandoc or doc2txt.pl or libreoffice MS Powerpoint (pptx) >= 2007 requires pptx2md or libreoffice MS Excel (xlsx) >= 2007 requires git-xlsx-textconv or lgit-xlsx-textconv.pl or libreoffice epub requires pandoc Debian requires ar, gzip and tar, shows more info if dpkg is installed html requires html2text or elinks or links or lynx or w3m pdf requires pdftotext or pdftohtml or pdfinfo perl requires pod2text rtf requires pandoc or libreoffice or unrtf dvi requires dvi2tty djvu requires djvutxt ps requires pstotext or ps2ascii (from the gs package) mp3, ogg requires mediainfo or id3v2 or mp3info or mp3info2 jpg, png, gif requires identify mp4 requires mediainfo iso images requires isoinfo MacOSX archive requires lsbom MacOS X bom requires lsbom MacOS X plist requires plutil cab requires cabextract (version 1.0 or above) gpg encrypted requires gpg perl storable requires perl (and the perl modules Storable and Data::Dumper) perl pod requires perldoc OASIS Opendocument text documents (used for Openoffice, Libreoffice) requires odt2txt or pandoc or libreoffice or unzip and o3tohtml or sxw2txt (distributed together with lesspipe) nc4 requires ncdump (NetCDF format) hdf5 requires h5dump (Hierarchical Data Format) crt, pem, csr, crl requires openssl

4.3 Conversion of files with alternate character encoding

If the file utility reports text containing ISO-8859, UTF-8 or UTF-16 encoded characters then the text will be transformed using iconv into the default encoding. This does assume iconv has the right default which can be wrong in some situations. It is checked if iconv would fail. Then the text is displayed unmodified.

4.4 File formats currently not supported

(code contributed but commented out)

jpeg and pbm graphics files to be displayed in ASCII art. The ASCII art library works with overprinting that does not work properly within less. Therefore the resulting quality of the converted picture is not satisfactory.

Display of video streams using mplayer with -aadriver (again ASCII art) is considered abuse of less and also commented out.

looking at contents of DOS formatted disks by accessing the proper device file

  1. Colorizing the output ========================

ATTENTION: Syntax highlighting and other methods of colorizing the output is only activated if the environment variable LESS is existing and contains the option -R or -r or less is called with one of these options.

This guarantees, that instead of literal escape sequences colors are displayed. The detection of the -r/-R presence at run time is rather dependent on the operating system and may not work in all cases. Putting the option in the LESS environment variable is guaranteed to work. By installing the perl module Proc::ProcessTable the OS dependence can be reduced as well.

The display of wrapped long lines and moving backward in a file using the options -r/-R can give weird output. For an explanation see http://www.greenwoodsoftware.com/less/faq.html#dashr

5.1 Syntax highlighting

Experimental support for syntax highlighting was added through a perl script 'code2color' which is derived from code2html [5].

As syntax highlighting is rather resource intense it can be switched off by appending a colon after the file name if the output was colorful. If the wrong language was chosen for syntax highlighting then another one can be forced by appending a colon and a suffix to the file name as follows (assuming this is a file with perl syntax):

less config_file:.pl

That works as well to force the call of code2color for a given language.

5.1.1 List of supported languages (code2color)

Text files for the following languages can be highlighted using code2color: ada, asm, awk, c, c++, groff, html, xml, java, javascript, lisp, m4, make, pascal, patch, perl, povray, python, ruby, shellscript, sql The corresponding suffixes recognized by code2color are: .ada .asm .inc .awk .c .h .cpp .cxx .groff .html .php .xml .java .js .lsp .m4 Makefile .pas .patch .diff .pm .pl .pod .pov .py .rb .sh .sql

5.1.2 Syntax highlighting alternatives

The enabling of syntax highlighting contains OS dependent code and is not guaranteed to work (it was tested on Linux, Solaris, IRIX, HPUX, AIX, MacOS X, Cygwin and FreeBSD). It is deactivated by default and not recommended by me. It can be activated using "configure" or "make MODE=ask".

The function code2color contains code to guarantee that color codes are only sent if less is called with one of the options -r or -R. To ensure that these checks are always performed, alternate syntax colorizers will be called from within code2color by setting the environment variable LESSCOLORIZER to the name of another program. Currently only pygmentize (and code2color as the default) is allowed. This can be changed in the first lines of code2color.

Much better syntax highlighting is obtained using the less emulation of vim: The editor vim comes with a file less.sh, in my case located in /usr/share/vim/vim73/macros. Assuming that file location a function lessc (bash, zsh, ksh users)

lessc () { /usr/share/vim/vim73/macros/less.sh "[email protected]"}

or an alias lessc (csh, tcsh users)

alias lessc /usr/share/vim/vim73/macros/less.sh

is defined and "lessc filename" is used to view the colorful file contents.

5.2 Colored Directory listing

The conditions to display a colored listing are described above. Depending on the operating system ls is then called with appropriate options to produce colored output.

5.3 Colored listing of tar file contents

As above less has to be called with -r or -R. If also the executable tarcolor (contained in the lesspipe tar file, see also [7]) is installed, then the listing of tar file contents is colored in a similar fashion as directory contents.

  1. Displaying files with special characters in the file name ============================================================

Shell meta characters in file names: space (frequently used in windows file names), the characters | & ; ( ) ` < > " ' # ~ = $ * ? [ ] or
must be escaped by a \ when used in the shell, e.g. less a\ b.tar.gz:a"b will display the file a"b contained in the gzipped tar archive a b.tar.gz.

Files within an archive that do have an isolated colon in the name cannot be displayed using the archive_name:contained_file_name notation. These files can be displayed using a notation with the alternate separator character = as follows: archive_name=contained_file_name= Please note the trailing = which is required.

  1. Examples ===========

As a typical usage case it is shown how one could display the man page "file.man" found in the Fedora10 RPM source archive file-4.26-3.fc10.src.rpm

The less command enhanced with the lesspipe.sh filter

less file-4.26-3.fc10.src.rpm

yields the following output ... -rw-r--r-- 1 root root 584803 Sep 15 16:29 file-4.26.tar.gz -rw-r--r-- 1 root root 17124 Oct 16 13:01 file.spec

Then the command

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz

produces the output ... -rw-rw-r-- 10080/10080 16027 2008-08-30 12:01:41 file-4.26/doc/Makefile.in -rw-rw-r-- 10080/10080 16097 2008-03-07 16:00:07 file-4.26/doc/file.man -rw-rw-r-- 10080/10080 16943 2008-08-30 11:50:20 file-4.26/doc/magic.man ...

The desired man page can finally be viewed with

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man

The subcomponents of the argument to less were easily obtained by cut and paste using information contained in the previous lines of output. Care has been taken to display the subcomponents already in the way required by lesspipe, so that in most cases double clicking will select it. If the nroff sources should have been displayed instead, appending another colon at the end of the argument would have done the job:

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man:

If the man page was compressed (e.g. as file.man.gz) it would have been uncompressed anyway. To also disallow uncompressing the source file.man.gz a second colon would have to be appended to the argument.

Even extracting single files from an archive is possible, like with

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/src/file.c > file.c

Files with binary contents can be extracted as well:

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:: > file-4.26.tar.gz

Here the two colons at the end of the argument are required to suppress the unzipping of the resulting file and to extract the tar file instead of interpreting it.

Another interesting example is to get the dominating colors of a picture, that contains a diagram with a few colors only. The command

less diagram.png

does produce a lot of information, among others ... Histogram: 720: ( 0, 0,127) #00007F 3032: (127,127,127) grey50 18935: ( 0, 0,255) blue 21480: ( 0,255, 0) lime 21041: ( 0,255,255) cyan 8719: (255, 0, 0) red 14476: (255, 0,255) magenta 8822: (255,183, 0) #FFB700 13608: (255,255, 0) yellow 49167: (255,255,255) white ...

Other interesting examples are the inspection of Java's .jar files or Debian package contents without unpacking the files and even without having java installed or without working necessarily on a Debian system.

The contrib directory does contain a less wrapper (a bash/zsh function) that can be used to display URLs using less. This allows to pass the URL contents through lesspipe as if it would be a local file.

  1. Other documentation about lesspipe ===================================== http://ref.cern.ch/CERN/CNL/2002/001/unix-less/ http://www.linux-magazine.com/issue/21/lesspipe.pdf in bash cookbook (Ch. 8.15) by Carl Albing, Cameron Newham, J. P. Vossen http://carloscosta.org/2008/07/05/how-to-get-more-from-less/ Documentation in german: german.txt (distributed with lesspipe, not updated) http://www.linux-magazin.de/Heft-Abo/Ausgaben/2001/01/Bessere-Sicht http://www.linux-user.de/ausgabe/2002/04/060-ootb/lesspipe-1.html

  2. External links =================

(last checked: Jan 10 2013):

9.1 URLs to some utilities

mediainfo https://mediaarea.net/MediaInfo wvText https://github.com/tsgouros/wv antiword http://www.winfield.demon.nl/ html2text http://www.mbayer.de/html2text/ cabextract http://www.cabextract.org.uk/ 7za https://sourceforge.net/projects/p7zip/ lzip http://download.savannah.gnu.org/releases/lzip/ dvi2tty http://www.ctan.org/tex-archive/dviware/dvi2tty/ unrtf http://ftp.gnu.org/gnu/unrtf/ docx2txt.pl https://github.com/arthursucks/docx2txt pandoc http://pandoc.org libreoffice http://www.libreoffice.org odt2txt https://github.com/dstosberg/odt2txt git-xlsx-textconv https://github.com/tokuhirom/git-xlsx-textconv git-xlsx-textconv.pl https://github.com/yappo/p5-git-xlsx-textconv.pl pptx2md https://github.com/ssine/pptx2md id3v2 http://id3v2.sourceforge.net/

9.2 References

[1] http://www.greenwoodsoftware.com/less/ (less) [2] ftp://ftp.astron.com/pub/file/ (file) [3] https://github.com/wofr06/lesspipe [5] http://www.palfrader.org/code2html/ (code2html) [6] http://www.darwinsys.com/file/ (file) [7] https://github.com/msabramo/tarcolor (tarcolor)

  1. Contributors ================

The script lesspipe.sh is constantly enhanced thanks to suggestions from users. Among the additions to lesspipe.sh is the code to browse the ASCII contents of Word or Openoffice files, to show characteristics of mp3 files or to decode MacOS X formats.

Thanks to (in alphabetical order): Marc Abramowitz: allow for color output when ls or tar commands are used James Ahlborn: do not interpret .xml files as html Sören Andersen: PPD files colorization requested Andrew Barnert: shell syntax fix Peter D. Barnes, Jr.: plist files for Mac OS X Eduard Bloch: proposed support for ISO images Mathieu Bouillaguet: add support for xz compression Florian Cramer: MS Word, Openoffice support (o3read), ASCIIart, DjVu support Philippe Defert: unattended installation Antonio Diaz Diaz: proposed support for lzip Bastian Fuchs: Issues using bash vs. sh Matt Ghali: more conservative usage of html2text Carl Greco: enhanced output for .deb files Stephan Hegel: suggested better 7za support Michel Hermier: support for DESTDIR in the Makefile Tobias Hoffmann: fixed a bug introduced in v 1.72, add support for Debian 2.0 files with xz packed data Christian Höltje: suggested to use LESSCOLORIZER like in the Gentoo distro Jürgen Kahnert: display debian files without dpkg Sebastian Kayser: suggested a less wrapper to display URLs Ben Kibbey: works on FreeBSD Peter Kostka: mktemp on MacOS fixes Heinrich Kuettler: formatting, html via lynx Antony Lee: correctly call pygmentize (no - argument, use -g) Vincent Lefèvre: runtime checks for shell, many enhancements, provided sxw2txt, determine options for 'file' at runtime, display text in the proper encoding David Leverton: detect helper programs at runtime Jay Levitt: suggested to use enscript for highlighting support (see 4.2 above) Vladimir Linek: inspired me to add ps and dvi support Oliver Mangold: correctly display directories with a colon in the name Istvan Marko: speedup of the procedure Markus Meyer: improved mp3 handling Remi Mommsen: Mac OS X support Derek B. Noonburg: PDF files support Martin Otte: mktemp on MacOS fixes Jim Pryor: many enhancements, bug fixes, restructuring of code, tar detection Slaven Rezic: Cygwin support, bug fixes Daniel Risacher: gpg support Jens Schleusener: ksh syntax fixes Ken Teague?: support more versions of file command Matt Thompson: add support for NetCDF and HDF5 files using ncdump, h5dump Paul Townsend: improved zip support for Solaris, bug fixes in configure Petr Uzel: detect helper programs at runtime Chelban Vasile: trap command not working under /bin/sh Götz Waschk: suggested lzma support Michael Wiedmann: Debian packages support Dale Wijnand: Proposed the suppression of informal output Peter Wu: shortcut for unreadable files

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].