All Projects → dubiousjim → Awkenough

dubiousjim / Awkenough

Licence: mit
basic libraries for awk

Programming Languages

awk
318 projects

Awkenough

This package includes:

  1. a small set of Awk utility routines, in library.awk
  2. a stub program runawk.c that makes it easier to write shell scripts with awk #!-lines
  3. a few Awk scripts in utils/ that supply some common Unix utilities. If you've got a BSD base system, or you have Gnu coreutils installed, you'll already have these; but if you're using BusyBox, you may not.

Installation

  1. gmake AWK=/usr/bin/awk
  2. sudo gmake PREFIX=/usr/local DESTDIR=/ install

The displayed values for the make variables are the defaults, and can be omitted if you're happy with those values.

AWK can also be set to /bin/busybox; in that case, the runawk binary will invoke busybox with an "awk" argument. This permits building runawk so that it always invokes BusyBox awk, even if another awk is linked from /usr/bin/awk.

Runawk

This stub program makes it easy to work around a limitation in the way many Unix kernels (including Linux) handle shebang (#!) lines. They will parse the shebang line only into a maximum of two components, and will then invoke the first component with an ARGV[1] of the second component (when it's present) and an ARGV[2] of the path of the script file containing the shebang line. This works fine if you just want to begin an awk script like this:

file /path/foo.awk
------------------
#!/usr/bin/awk -f
...
------------------

If you try to exec /path/foo.awk [...args...], the kernel will read its shebang line and will then invoke:

"/usr/bin/awk" "-f" "/path/foo.awk" [..args...]

(I included the quotes to make the parsing explicit.)

In such cases, everything works great. However, what if you want to do this:

file /path/bar.awk
------------------
#!/usr/bin/awk -vRS=; -f
...
------------------

If you try to exec /path/bar.awk [...args...], the kernel will read its shebang line and will then invoke:

"/usr/bin/awk" "-vRS=; -f" "/path/bar.awk" [...args...]

Notice the parsing. The remainder of the shebang line is passed to awk as a single argument. Awk won't understand this and your script will fail.

This is a general limitation of how most kernels handle shebang lines and isn't specific to awk. There are workarounds. For example, bar.awk could be rewritten as:

file /path/bar.awk
------------------
#!/usr/bin/awk -f
BEGIN { RS=";" }
...
------------------

Or alternatively, you could give up on trying to exec bar.awk, and instead exec a shell script that bootstraps bar.awk:

file /path/bar.sh
------------------
#!/bin/sh
/usr/bin/awk -vRS=; -f /path/bar.awk "[email protected]"
------------------

If you wanted to load additional awk scripts containing utility routines, like the library.awk provided with this package, the second workaround is the only way you could do it:

------------------
#!/bin/sh
/usr/bin/awk -f/usr/share/awkenough/library.awk -f /path/bar.awk "[email protected]"
------------------

This is where the runawk stub program makes things easier. It is prepared for the kernel's handling of shebang lines and will automatically expand arguments it needs to before invoking awk. So you can just write your scripts like this:

file /path/qux.awk
------------------
#!/usr/bin/runawk -f /usr/share/awkenough/library.awk
...
------------------

(Note that here we leave off the terminal "-f".) When you try to exec qux.awk [...args...], the kernel will invoke:

"/usr/bin/runawk" "-f /usr/share/awkenough/library.awk" "/path/qux.awk" [...args...]

and runawk will in turn invoke:

"/usr/bin/awk" "-f" "/usr/share/awkenough/library.awk" "-f" "/path/qux.awk" [...args...]

so everything will work as you probably intended.

Runawk handles the same command-line options as most awks:

  • -F arg
  • -v VAR=value
  • -f file.awk

It also accepts the long-form "--file file.awk" as an alternate for the third option, like gawk does.

Runawk also accepts gawk's

  • -e '...awk script ...'

option (and its long-form --source '...awk script...'). The point of this option is that standard awk can be invoked like this:

awk -f library.awk -f file2 argv1 ...

or like this:

awk '...awk script...' argv1 ...

but can't be invoked like this:

awk -f library.awk '...awk script...' argv1 ...

That is, the -f option can't be mixed with providing an awk script directly on the command line, rather than in a separate file. Gawk and runawk's -e option provide a way to do this:

runawk -f library.awk -e '...awk script...' argv1 ...

Runawk also knows how to handle gawk's --exec option. See the source for details.

Runawk also accepts a further option not present in gawk. This is:

  • --stdin

and its effect is to guarantee that the awk program being invoked is handed stdin for processing, after processing any other ARGVs. In many cases awk will already do that automatically, but not all; so if you know in advance you want your script to always receive stdin, this option may be useful. For example, the following script:

file /path/greet.awk
------------------
#!/usr/bin/runawk -f /usr/share/awkenough/library.awk
BEGIN {
    print "Hello, " ARGV[1]
    ARGV[1] = ""
    print "Here is stdin:"
}

FILENAME == "-" { print }
------------------

works great if invoked as "/path/greet.awk MyName". But if it's invoked as "/path/greet.awk MyName /path/another/file", it won't behave as expected. True, it won't print the lines of "/path/another/file"; the FILENAME check prevents that. But neither will it print stdin, because the presence of a file in its ARGV list means that awk will deem it unnecessary to also run the script over its stdin.

You can work around this manually, by inspecting and rewriting all your ARGVs as necessary, or by reading stdin manually using getline instead of waiting for awk to pass it to the main body of your script. But the easiest fix is to just add a --stdin to the runawk shebang line:

file /path/greet.awk
------------------
#!/usr/bin/runawk -f /usr/share/awkenough/library.awk --stdin
...
------------------
  • Any other short-form options passed to runawk (like "-x") raise an error.

  • Any other long-form options are silently passed to the awk binary.

This runawk was adapted from a more full-featured one developed by Aleksey Cheusov [email protected]. His is available at http://runawk.sourceforge.net/. I didn't want all the features in his implementation, and I wanted to do some things differently. But you should also have a look at his version and see if it fits your needs better. By default, our runawk binary installs as /usr/bin/runawklite, so as to avoid clashing with his.

Library routines

See the file README.lib.

Utilities

Help is usually available by running <utility> --help.

column fmt join paste These all work like their namesakes in Gnu coreutils or the BSD tool set.

members Prints a list of all users who are members of .

uuniq Eliminates duplicates like uniq does, without requiring the input to be sorted. Moreover, the output will preserve the order of all unique lines.

Links

http://wiki.alpinelinux.org/wiki/Awk may also be useful to the audience of this package: it summarizes some gritty details about awk semantics, the behavior of some different awk implementations, and correlates functions available in awk + awkenough to those available in lua.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].