Print this page
12482 Have /usr/bin/awk point to /usr/bin/nawk
Reviewed by: Peter Tribble <peter.tribble@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>

Split Close
Expand all
Collapse all
          --- old/usr/src/man/man1/awk.1.man.txt
          +++ new/usr/src/man/man1/awk.1.man.txt
   1    1  AWK(1)                           User Commands                          AWK(1)
   2    2  
   3    3  
   4    4  
   5    5  NAME
   6    6         awk - pattern scanning and processing language
   7    7  
   8    8  SYNOPSIS
   9      -       /usr/bin/awk [-f progfile] [-Fc] [' prog '] [parameters]
  10      -            [filename]...
        9 +       /usr/bin/awk [-F ERE] [-v assignment] 'program' | -f progfile...
       10 +            [argument]...
  11   11  
  12   12  
  13      -       /usr/xpg4/bin/awk [-FcERE] [-v assignment]... 'program' -f progfile...
       13 +       /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile...
  14   14              [argument]...
  15   15  
  16   16  
       17 +       /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile...
       18 +            [argument]...
       19 +
       20 +
  17   21  DESCRIPTION
  18      -       The /usr/xpg4/bin/awk utility is described on the nawk(1) manual page.
       22 +       NOTE: The nawk command is now the system default awk for illumos.
  19   23  
       24 +       The /usr/bin/awk and /usr/xpg4/bin/awk utilities execute programs
       25 +       written in the awk programming language, which is specialized for
       26 +       textual data manipulation. A awk program is a sequence of patterns and
       27 +       corresponding actions. The string specifying program must be enclosed
       28 +       in single quotes (') to protect it from interpretation by the shell.
       29 +       The sequence of pattern - action statements can be specified in the
       30 +       command line as program or in one, or more, file(s) specified by the
       31 +       -fprogfile option. When input is read that matches a pattern, the
       32 +       action associated with the pattern is performed.
  20   33  
  21      -       The /usr/bin/awk utility scans each input filename for lines that match
  22      -       any of a set of patterns specified in prog. The prog string must be
  23      -       enclosed in single quotes ( a') to protect it from the shell.  For each
  24      -       pattern in prog there can be an associated action performed when a line
  25      -       of a filename matches the pattern. The set of pattern-action statements
  26      -       can appear literally as prog or in a file specified with the -f
  27      -       progfile option. Input files are read in order; if there are no files,
  28      -       the standard input is read. The file name '-' means the standard input.
  29   34  
       35 +       Input is interpreted as a sequence of records. By default, a record is
       36 +       a line, but this can be changed by using the RS built-in variable. Each
       37 +       record of input is matched to each pattern in the program. For each
       38 +       pattern matched, the associated action is executed.
       39 +
       40 +
       41 +       The awk utility interprets each input record as a sequence of fields
       42 +       where, by default, a field is a string of non-blank characters. This
       43 +       default white-space field delimiter (blanks and/or tabs) can be changed
       44 +       by using the FS built-in variable or the -FERE option. The awk utility
       45 +       denotes the first field in a record $1, the second $2, and so forth.
       46 +       The symbol $0 refers to the entire record; setting any other field
       47 +       causes the reevaluation of $0. Assigning to $0 resets the values of all
       48 +       fields and the NF built-in variable.
       49 +
       50 +
  30   51  OPTIONS
  31   52         The following options are supported:
  32   53  
       54 +       -F ERE
       55 +                        Define the input field separator to be the extended
       56 +                        regular expression ERE, before any input is read (can
       57 +                        be a character).
       58 +
       59 +
  33   60         -f progfile
  34      -                       awk uses the set of patterns it reads from progfile.
       61 +                        Specifies the pathname of the file progfile containing
       62 +                        a awk program. If multiple instances of this option
       63 +                        are specified, the concatenation of the files
       64 +                        specified as progfile in the order specified is the
       65 +                        awk program. The awk program can alternatively be
       66 +                        specified in the command line as a single argument.
  35   67  
  36   68  
  37      -       -Fc
  38      -                       Uses the character c as the field separator (FS)
  39      -                       character.  See the discussion of FS below.
       69 +       -v assignment
       70 +                        The assignment argument must be in the same form as an
       71 +                        assignment operand. The assignment is of the form
       72 +                        var=value, where var is the name of one of the
       73 +                        variables described below. The specified assignment
       74 +                        occurs before executing the awk program, including the
       75 +                        actions associated with BEGIN patterns (if any).
       76 +                        Multiple occurrences of this option can be specified.
  40   77  
  41   78  
  42      -USAGE
  43      -   Input Lines
  44      -       Each input line is matched against the pattern portion of every
  45      -       pattern-action statement; the associated action is performed for each
  46      -       matched pattern. Any filename of the form var=value is treated as an
  47      -       assignment, not a filename, and is executed at the time it would have
  48      -       been opened if it were a filename. Variables assigned in this manner
  49      -       are not available inside a BEGIN rule, and are assigned after
  50      -       previously specified files have been read.
       79 +       -safe
       80 +                        When passed to awk, this flag will prevent the program
       81 +                        from opening new files or running child processes. The
       82 +                        ENVIRON array will also not be initialized.
  51   83  
  52   84  
  53      -       An input line is normally made up of fields separated by white spaces.
  54      -       (This default can be changed by using the FS built-in variable or the
  55      -       -Fc option.) The default is to ignore leading blanks and to separate
  56      -       fields by blanks and/or tab characters. However, if FS is assigned a
  57      -       value that does not include any of the white spaces, then leading
  58      -       blanks are not ignored. The fields are denoted $1, $2, ...; $0 refers
  59      -       to the entire line.
       85 +OPERANDS
       86 +       The following operands are supported:
  60   87  
  61      -   Pattern-action Statements
  62      -       A pattern-action statement has the form:
       88 +       program
       89 +                   If no -f option is specified, the first operand to awk is
       90 +                   the text of the awk program. The application supplies the
       91 +                   program operand as a single argument to awk. If the text
       92 +                   does not end in a newline character, awk interprets the
       93 +                   text as if it did.
  63   94  
       95 +
       96 +       argument
       97 +                   Either of the following two types of argument can be
       98 +                   intermixed:
       99 +
      100 +                   file
      101 +                                 A pathname of a file that contains the input
      102 +                                 to be read, which is matched against the set
      103 +                                 of patterns in the program. If no file
      104 +                                 operands are specified, or if a file operand
      105 +                                 is -, the standard input is used.
      106 +
      107 +
      108 +                   assignment
      109 +                                 An operand that begins with an underscore or
      110 +                                 alphabetic character from the portable
      111 +                                 character set, followed by a sequence of
      112 +                                 underscores, digits and alphabetics from the
      113 +                                 portable character set, followed by the =
      114 +                                 character specifies a variable assignment
      115 +                                 rather than a pathname. The characters before
      116 +                                 the = represent the name of a awk variable.
      117 +                                 If that name is a awk reserved word, the
      118 +                                 behavior is undefined. The characters
      119 +                                 following the equal sign is interpreted as if
      120 +                                 they appeared in the awk program preceded and
      121 +                                 followed by a double-quote (") character, as
      122 +                                 a STRING token , except that if the last
      123 +                                 character is an unescaped backslash, it is
      124 +                                 interpreted as a literal backslash rather
      125 +                                 than as the first character of the sequence
      126 +                                 \.. The variable is assigned the value of
      127 +                                 that STRING token. If the value is considered
      128 +                                 a numericstring, the variable is assigned its
      129 +                                 numeric value. Each such variable assignment
      130 +                                 is performed just before the processing of
      131 +                                 the following file, if any. Thus, an
      132 +                                 assignment before the first file argument is
      133 +                                 executed after the BEGIN actions (if any),
      134 +                                 while an assignment after the last file
      135 +                                 argument is executed before the END actions
      136 +                                 (if any).  If there are no file arguments,
      137 +                                 assignments are executed before processing
      138 +                                 the standard input.
      139 +
      140 +
      141 +
      142 +INPUT FILES
      143 +       Input files to the awk program from any of the following sources:
      144 +
      145 +           o      any file operands or their equivalents, achieved by
      146 +                  modifying the awk variables ARGV and ARGC
      147 +
      148 +           o      standard input in the absence of any file operands
      149 +
      150 +           o      arguments to the getline function
      151 +
      152 +
      153 +       must be text files. Whether the variable RS is set to a value other
      154 +       than a newline character or not, for these files, implementations
      155 +       support records terminated with the specified separator up to
      156 +       {LINE_MAX} bytes and can support longer records.
      157 +
      158 +
      159 +       If -f progfile is specified, the files named by each of the progfile
      160 +       option-arguments must be text files containing an awk program.
      161 +
      162 +
      163 +       The standard input are used only if no file operands are specified, or
      164 +       if a file operand is -.
      165 +
      166 +
      167 +EXTENDED DESCRIPTION
      168 +       A awk program is composed of pairs of the form:
      169 +
  64  170           pattern { action }
  65  171  
  66  172  
  67  173  
      174 +       Either the pattern or the action (including the enclosing brace
      175 +       characters) can be omitted. Pattern-action statements are separated by
      176 +       a semicolon or by a newline.
  68  177  
  69      -       Either pattern or action can be omitted. If there is no action, the
  70      -       matching line is printed. If there is no pattern, the action is
  71      -       performed on every input line. Pattern-action statements are separated
  72      -       by newlines or semicolons.
  73  178  
      179 +       A missing pattern matches any record of input, and a missing action is
      180 +       equivalent to an action that writes the matched record of input to
      181 +       standard output.
  74  182  
  75      -       Patterns are arbitrary Boolean combinations ( !, ||, &&, and
  76      -       parentheses) of relational expressions and regular expressions. A
  77      -       relational expression is one of the following:
  78  183  
  79      -         expression relop expression
  80      -         expression matchop regular_expression
      184 +       Execution of the awk program starts by first executing the actions
      185 +       associated with all BEGIN patterns in the order they occur in the
      186 +       program. Then each file operand (or standard input if no files were
      187 +       specified) is processed by reading data from the file until a record
      188 +       separator is seen (a newline character by default), splitting the
      189 +       current record into fields using the current value of FS, evaluating
      190 +       each pattern in the program in the order of occurrence, and executing
      191 +       the action associated with each pattern that matches the current
      192 +       record. The action for a matching pattern is executed before evaluating
      193 +       subsequent patterns. Last, the actions associated with all END patterns
      194 +       is executed in the order they occur in the program.
  81  195  
  82  196  
      197 +   Expressions in awk
      198 +       Expressions describe computations used in patterns and actions. In the
      199 +       following table, valid expression operations are given in groups from
      200 +       highest precedence first to lowest precedence last, with equal-
      201 +       precedence operators grouped between horizontal lines. In expression
      202 +       evaluation, where the grammar is formally ambiguous, higher precedence
      203 +       operators are evaluated before lower precedence operators.  In this
      204 +       table expr, expr1, expr2, and expr3 represent any expression, while
      205 +       lvalue represents any entity that can be assigned to (that is, on the
      206 +       left side of an assignment operator).
  83  207  
  84      -       where a relop is any of the six relational operators in C, and a
  85      -       matchop is either ~ (contains) or !~ (does not contain). An expression
  86      -       is an arithmetic expression, a relational expression, the special
  87      -       expression
  88  208  
  89      -         var in array
  90  209  
  91  210  
      211 +           Syntax                  Name              Type of Result     Associativity
      212 +       -------------------------------------------------------------------------------
      213 +       ( expr )          Grouping                   type of expr        n/a
      214 +       -------------------------------------------------------------------------------
      215 +       $expr             Field reference            string              n/a
      216 +       -------------------------------------------------------------------------------
      217 +       ++ lvalue         Pre-increment              numeric             n/a
      218 +       -- lvalue         Pre-decrement              numeric             n/a
      219 +       lvalue ++         Post-increment             numeric             n/a
      220 +       lvalue --         Post-decrement             numeric             n/a
      221 +       -------------------------------------------------------------------------------
      222 +       expr ^ expr       Exponentiation             numeric             right
      223 +       -------------------------------------------------------------------------------
      224 +       ! expr            Logical not                numeric             n/a
      225 +       + expr            Unary plus                 numeric             n/a
      226 +       - expr            Unary minus                numeric             n/a
      227 +       -------------------------------------------------------------------------------
      228 +       expr * expr       Multiplication             numeric             left
      229 +       expr / expr       Division                   numeric             left
      230 +       expr % expr       Modulus                    numeric             left
      231 +       -------------------------------------------------------------------------------
      232 +       expr + expr       Addition                   numeric             left
      233 +       expr - expr       Subtraction                numeric             left
      234 +       -------------------------------------------------------------------------------
      235 +       expr expr         String concatenation       string              left
      236 +       -------------------------------------------------------------------------------
      237 +       expr < expr       Less than                  numeric             none
      238 +       expr <= expr      Less than or equal to      numeric             none
      239 +       expr != expr      Not equal to               numeric             none
      240 +       expr == expr      Equal to                   numeric             none
      241 +       expr > expr       Greater than               numeric             none
      242 +       expr >= expr      Greater than or equal to   numeric             none
      243 +       -------------------------------------------------------------------------------
      244 +       expr ~ expr       ERE match                  numeric             none
      245 +       expr !~ expr      ERE non-match               numeric            none
      246 +       -------------------------------------------------------------------------------
      247 +       expr in array     Array membership           numeric             left
      248 +       ( index ) in      Multi-dimension array      numeric             left
      249 +           array             membership
      250 +       -------------------------------------------------------------------------------
      251 +       expr && expr      Logical AND                numeric             left
      252 +       -------------------------------------------------------------------------------
      253 +       expr || expr      Logical OR                 numeric             left
      254 +       -------------------------------------------------------------------------------
      255 +       expr1 ? expr2     Conditional expression     type of selected    right
      256 +           : expr3                                     expr2 or expr3
      257 +       -------------------------------------------------------------------------------
      258 +       lvalue ^= expr    Exponentiation             numeric             right
      259 +                         assignment
      260 +       lvalue %= expr    Modulus assignment         numeric             right
      261 +       lvalue *= expr    Multiplication             numeric             right
      262 +                         assignment
      263 +       lvalue /= expr    Division assignment        numeric             right
      264 +       lvalue +=  expr   Addition assignment        numeric             right
      265 +       lvalue -= expr    Subtraction assignment     numeric             right
      266 +       lvalue = expr     Assignment                 type of expr        right
  92  267  
  93      -       or a Boolean combination of these.
  94  268  
  95  269  
  96      -       Regular expressions are as in egrep(1). In patterns they must be
  97      -       surrounded by slashes. Isolated regular expressions in a pattern apply
  98      -       to the entire line. Regular expressions can also occur in relational
  99      -       expressions. A pattern can consist of two patterns separated by a
 100      -       comma; in this case, the action is performed for all lines between the
 101      -       occurrence of the first pattern to the occurrence of the second
 102      -       pattern.
      270 +       Each expression has either a string value, a numeric value or both.
      271 +       Except as stated for specific contexts, the value of an expression is
      272 +       implicitly converted to the type needed for the context in which it is
      273 +       used.  A string value is converted to a numeric value by the equivalent
      274 +       of the following calls:
 103  275  
      276 +         setlocale(LC_NUMERIC, "");
      277 +         numeric_value = atof(string_value);
 104  278  
 105      -       The special patterns BEGIN and END can be used to capture control
 106      -       before the first input line has been read and after the last input line
 107      -       has been read respectively. These keywords do not combine with any
 108      -       other patterns.
 109  279  
 110      -   Built-in Variables
 111      -       Built-in variables include:
 112  280  
      281 +       A numeric value that is exactly equal to the value of an integer is
      282 +       converted to a string by the equivalent of a call to the sprintf
      283 +       function with the string %d as the fmt argument and the numeric value
      284 +       being converted as the first and only expr argument.  Any other numeric
      285 +       value is converted to a string by the equivalent of a call to the
      286 +       sprintf function with the value of the variable CONVFMT as the fmt
      287 +       argument and the numeric value being converted as the first and only
      288 +       expr argument.
      289 +
      290 +
      291 +       A string value is considered to be a numeric string in the following
      292 +       case:
      293 +
      294 +           1.     Any leading and trailing blank characters is ignored.
      295 +
      296 +           2.     If the first unignored character is a + or -, it is ignored.
      297 +
      298 +           3.     If the remaining unignored characters would be lexically
      299 +                  recognized as a NUMBER token, the string is considered a
      300 +                  numeric string.
      301 +
      302 +
      303 +       If a - character is ignored in the above steps, the numeric value of
      304 +       the numeric string is the negation of the numeric value of the
      305 +       recognized NUMBER token. Otherwise the numeric value of the numeric
      306 +       string is the numeric value of the recognized NUMBER token. Whether or
      307 +       not a string is a numeric string is relevant only in contexts where
      308 +       that term is used in this section.
      309 +
      310 +
      311 +       When an expression is used in a Boolean context, if it has a numeric
      312 +       value, a value of zero is treated as false and any other value is
      313 +       treated as true.  Otherwise, a string value of the null string is
      314 +       treated as false and any other value is treated as true. A Boolean
      315 +       context is one of the following:
      316 +
      317 +           o      the first subexpression of a conditional expression.
      318 +
      319 +           o      an expression operated on by logical NOT, logical AND, or
      320 +                  logical OR.
      321 +
      322 +           o      the second expression of a for statement.
      323 +
      324 +           o      the expression of an if statement.
      325 +
      326 +           o      the expression of the while clause in either a while or do
      327 +                  ... while statement.
      328 +
      329 +           o      an expression used as a pattern (as in Overall Program
      330 +                  Structure).
      331 +
      332 +
      333 +       The awk language supplies arrays that are used for storing numbers or
      334 +       strings. Arrays need not be declared. They are initially empty, and
      335 +       their sizes changes dynamically. The subscripts, or element
      336 +       identifiers, are strings, providing a type of associative array
      337 +       capability. An array name followed by a subscript within square
      338 +       brackets can be used as an lvalue and as an expression, as described in
      339 +       the grammar.  Unsubscripted array names are used in only the following
      340 +       contexts:
      341 +
      342 +           o      a parameter in a function definition or function call.
      343 +
      344 +           o      the NAME token following any use of the keyword in.
      345 +
      346 +
      347 +       A valid array index consists of one or more comma-separated
      348 +       expressions, similar to the way in which multi-dimensional arrays are
      349 +       indexed in some programming languages. Because awk arrays are really
      350 +       one-dimensional, such a comma-separated list is converted to a single
      351 +       string by concatenating the string values of the separate expressions,
      352 +       each separated from the other by the value of the SUBSEP variable.
      353 +
      354 +
      355 +       Thus, the following two index operations are equivalent:
      356 +
      357 +         var[expr1, expr2, ... exprn]
      358 +         var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
      359 +
      360 +
      361 +
      362 +       A multi-dimensioned index used with the in operator must be put in
      363 +       parentheses. The in operator, which tests for the existence of a
      364 +       particular array element, does not create the element if it does not
      365 +       exist.  Any other reference to a non-existent array element
      366 +       automatically creates it.
      367 +
      368 +
      369 +   Variables and Special Variables
      370 +       Variables can be used in an awk program by referencing them. With the
      371 +       exception of function parameters, they are not explicitly declared.
      372 +       Uninitialized scalar variables and array elements have both a numeric
      373 +       value of zero and a string value of the empty string.
      374 +
      375 +
      376 +       Field variables are designated by a $ followed by a number or numerical
      377 +       expression. The effect of the field number expression evaluating to
      378 +       anything other than a non-negative integer is unspecified.
      379 +       Uninitialized variables or string values need not be converted to
      380 +       numeric values in this context. New field variables are created by
      381 +       assigning a value to them.  References to non-existent fields (that is,
      382 +       fields after $NF) produce the null string. However, assigning to a non-
      383 +       existent field (for example, $(NF+2) = 5) increases the value of NF,
      384 +       create any intervening fields with the null string as their values and
      385 +       cause the value of $0 to be recomputed, with the fields being separated
      386 +       by the value of OFS. Each field variable has a string value when
      387 +       created. If the string, with any occurrence of the decimal-point
      388 +       character from the current locale changed to a period character, is
      389 +       considered a numeric string (see Expressions in awk above), the field
      390 +       variable also has the numeric value of the numeric string.
      391 +
      392 +
      393 +   /usr/bin/awk, /usr/xpg4/bin/awk
      394 +       awk sets the following special variables that are supported by both
      395 +       /usr/bin/awk and /usr/xpg4/bin/awk:
      396 +
      397 +       ARGC
      398 +                   The number of elements in the ARGV array.
      399 +
      400 +
      401 +       ARGV
      402 +                   An array of command line arguments, excluding options and
      403 +                   the program argument, numbered from zero to ARGC-1.
      404 +
      405 +                   The arguments in ARGV can be modified or added to; ARGC can
      406 +                   be altered.  As each input file ends, awk treats the next
      407 +                   non-null element of ARGV, up to the current value of
      408 +                   ARGC-1, inclusive, as the name of the next input file.
      409 +                   Setting an element of ARGV to null means that it is not
      410 +                   treated as an input file. The name - indicates the standard
      411 +                   input. If an argument matches the format of an assignment
      412 +                   operand, this argument is treated as an assignment rather
      413 +                   than a file argument.
      414 +
      415 +
      416 +       CONVFMT
      417 +                   The printf format for converting numbers to strings (except
      418 +                   for output statements, where OFMT is used). The default is
      419 +                   %.6g.
      420 +
      421 +
      422 +       ENVIRON
      423 +                   The variable ENVIRON is an array representing the value of
      424 +                   the environment. The indices of the array are strings
      425 +                   consisting of the names of the environment variables, and
      426 +                   the value of each array element is a string consisting of
      427 +                   the value of that variable. If the value of an environment
      428 +                   variable is considered a numeric string, the array element
      429 +                   also has its numeric value.
      430 +
      431 +                   In all cases where awk behavior is affected by environment
      432 +                   variables (including the environment of any commands that
      433 +                   awk executes via the system function or via pipeline
      434 +                   redirections with the print statement, the printf
      435 +                   statement, or the getline function), the environment used
      436 +                   is the environment at the time awk began executing.
      437 +
      438 +
 113  439         FILENAME
 114      -                    name of the current input file
      440 +                   A pathname of the current input file. Inside a BEGIN action
      441 +                   the value is undefined. Inside an END action the value is
      442 +                   the name of the last input file processed.
 115  443  
 116  444  
      445 +       FNR
      446 +                   The ordinal number of the current record in the current
      447 +                   file. Inside a BEGIN action the value is zero. Inside an
      448 +                   END action the value is the number of the last record
      449 +                   processed in the last file processed.
      450 +
      451 +
 117  452         FS
 118      -                    input field separator regular expression (default blank
 119      -                    and tab)
      453 +                   Input field separator regular expression; a space character
      454 +                   by default.
 120  455  
 121  456  
 122  457         NF
 123      -                    number of fields in the current record
      458 +                   The number of fields in the current record. Inside a BEGIN
      459 +                   action, the use of NF is undefined unless a getline
      460 +                   function without a var argument is executed previously.
      461 +                   Inside an END action, NF retains the value it had for the
      462 +                   last record read, unless a subsequent, redirected, getline
      463 +                   function without a var argument is performed prior to
      464 +                   entering the END action.
 124  465  
 125  466  
 126  467         NR
 127      -                    ordinal number of the current record
      468 +                   The ordinal number of the current record from the start of
      469 +                   input. Inside a BEGIN action the value is zero. Inside an
      470 +                   END action the value is the number of the last record
      471 +                   processed.
 128  472  
 129  473  
 130  474         OFMT
 131      -                    output format for numbers (default %.6g)
      475 +                   The printf format for converting numbers to strings in
      476 +                   output statements "%.6g" by default. The result of the
      477 +                   conversion is unspecified if the value of OFMT is not a
      478 +                   floating-point format specification.
 132  479  
 133  480  
 134  481         OFS
 135      -                    output field separator (default blank)
      482 +                   The print statement output field separator; a space
      483 +                   character by default.
 136  484  
 137  485  
 138  486         ORS
 139      -                    output record separator (default new-line)
      487 +                   The print output record separator; a newline character by
      488 +                   default.
 140  489  
 141  490  
      491 +       RLENGTH
      492 +                   The length of the string matched by the match function.
      493 +
      494 +
 142  495         RS
 143      -                    input record separator (default new-line)
      496 +                   The first character of the string value of RS is the input
      497 +                   record separator; a newline character by default. If RS
      498 +                   contains more than one character, the results are
      499 +                   unspecified. If RS is null, then records are separated by
      500 +                   sequences of one or more blank lines. Leading or trailing
      501 +                   blank lines do not produce empty records at the beginning
      502 +                   or end of input, and the field separator is always newline,
      503 +                   no matter what the value of FS.
 144  504  
 145  505  
      506 +       RSTART
      507 +                   The starting position of the string matched by the match
      508 +                   function, numbering from 1. This is always equivalent to
      509 +                   the return value of the match function.
 146  510  
      511 +
      512 +       SUBSEP
      513 +                   The subscript separator string for multi-dimensional
      514 +                   arrays. The default value is \034.
      515 +
      516 +
      517 +   /usr/bin/awk
      518 +       The following variable is supported for /usr/bin/awk only:
      519 +
      520 +       RT
      521 +                   The record terminator for the most recent record read. For
      522 +                   most records this will be the same value as RS. At the end
      523 +                   of a file with no trailing separator value, though, this
      524 +                   will be set to the empty string ("").
      525 +
      526 +
      527 +   Regular Expressions
      528 +       The awk utility makes use of the extended regular expression notation
      529 +       (see regex(5)) except that it allows the use of C-language conventions
      530 +       to escape special characters within the EREs, namely \\, \a, \b, \f,
      531 +       \n, \r, \t, \v, and those specified in the following table.  These
      532 +       escape sequences are recognized both inside and outside bracket
      533 +       expressions.  Note that records need not be separated by newline
      534 +       characters and string constants can contain newline characters, so even
      535 +       the \n sequence is valid in awk EREs.  Using a slash character within
      536 +       the regular expression requires escaping as shown in the table below:
      537 +
      538 +
      539 +
      540 +
      541 +       Escape Sequence   Description                Meaning
      542 +       ----------------------------------------------------------------------
      543 +       \"                Backslash quotation-mark   Quotation-mark character
      544 +       ----------------------------------------------------------------------
      545 +       \/                Backslash slash            Slash character
      546 +       ----------------------------------------------------------------------
      547 +       \ddd              A backslash character      The character encoded by
      548 +                         followed by the longest    the one-, two- or
      549 +                         sequence of one, two, or   three-digit octal
      550 +                         three octal-digit          integer. Multi-byte
      551 +                         characters (01234567).     characters require
      552 +                         If all of the digits are   multiple, concatenated
      553 +                         0, (that is,               escape sequences,
      554 +                         representation of the      including the leading \
      555 +                         NULL character), the       for each byte.
      556 +                         behavior is undefined.
      557 +       ----------------------------------------------------------------------
      558 +       \c                A backslash character      Undefined
      559 +                         followed by any
      560 +                         character not described
      561 +                         in this table or special
      562 +                         characters (\\, \a, \b,
      563 +                         \f, \n, \r, \t, \v).
      564 +
      565 +
      566 +
      567 +       A regular expression can be matched against a specific field or string
      568 +       by using one of the two regular expression matching operators, ~ and
      569 +       !~.  These operators interpret their right-hand operand as a regular
      570 +       expression and their left-hand operand as a string. If the regular
      571 +       expression matches the string, the ~ expression evaluates to the value
      572 +       1, and the !~ expression evaluates to the value 0. If the regular
      573 +       expression does not match the string, the ~ expression evaluates to the
      574 +       value 0, and the !~ expression evaluates to the value 1. If the right-
      575 +       hand operand is any expression other than the lexical token ERE, the
      576 +       string value of the expression is interpreted as an extended regular
      577 +       expression, including the escape conventions described above. Notice
      578 +       that these same escape conventions also are applied in the determining
      579 +       the value of a string literal (the lexical token STRING), and is
      580 +       applied a second time when a string literal is used in this context.
      581 +
      582 +
      583 +       When an ERE token appears as an expression in any context other than as
      584 +       the right-hand of the ~ or !~ operator or as one of the built-in
      585 +       function arguments described below, the value of the resulting
      586 +       expression is the equivalent of:
      587 +
      588 +         $0 ~ /ere/
      589 +
      590 +
      591 +
      592 +       The ere argument to the gsub, match, sub functions, and the fs argument
      593 +       to the split function (see String Functions) is interpreted as extended
      594 +       regular expressions. These can be either ERE tokens or arbitrary
      595 +       expressions, and are interpreted in the same manner as the right-hand
      596 +       side of the ~ or !~ operator.
      597 +
      598 +
      599 +       An extended regular expression can be used to separate fields by using
      600 +       the -F ERE option or by assigning a string containing the expression to
      601 +       the built-in variable FS. The default value of the FS variable is a
      602 +       single space character. The following describes FS behavior:
      603 +
      604 +           1.     If FS is a single character:
      605 +
      606 +               o      If FS is the space character, skip leading and trailing
      607 +                      blank characters; fields are delimited by sets of one or
      608 +                      more blank characters.
      609 +
      610 +               o      Otherwise, if FS is any other character c, fields are
      611 +                      delimited by each single occurrence of c.
      612 +
      613 +           2.     Otherwise, the string value of FS is considered to be an
      614 +                  extended regular expression. Each occurrence of a sequence
      615 +                  matching the extended regular expression delimits fields.
      616 +
      617 +
      618 +       Except in the gsub, match, split, and sub built-in functions, regular
      619 +       expression matching is based on input records. That is, record
      620 +       separator characters (the first character of the value of the variable
      621 +       RS, a newline character by default) cannot be embedded in the
      622 +       expression, and no expression matches the record separator character.
      623 +       If the record separator is not a newline character, newline characters
      624 +       embedded in the expression can be matched. In those four built-in
      625 +       functions, regular expression matching are based on text strings. So,
      626 +       any character (including the newline character and the record
      627 +       separator) can be embedded in the pattern and an appropriate pattern
      628 +       matches any character. However, in all awk regular expression matching,
      629 +       the use of one or more NULL characters in the pattern, input record or
      630 +       text string produces undefined results.
      631 +
      632 +
      633 +   Patterns
      634 +       A pattern is any valid expression, a range specified by two expressions
      635 +       separated by comma, or one of the two special patterns BEGIN or END.
      636 +
      637 +
      638 +   Special Patterns
      639 +       The awk utility recognizes two special patterns, BEGIN and END. Each
      640 +       BEGIN pattern is matched once and its associated action executed before
      641 +       the first record of input is read (except possibly by use of the
      642 +       getline function in a prior BEGIN action) and before command line
      643 +       assignment is done. Each END pattern is matched once and its associated
      644 +       action executed after the last record of input has been read. These two
      645 +       patterns have associated actions.
      646 +
      647 +
      648 +       BEGIN and END do not combine with other patterns.  Multiple BEGIN and
      649 +       END patterns are allowed. The actions associated with the BEGIN
      650 +       patterns are executed in the order specified in the program, as are the
      651 +       END actions. An END pattern can precede a BEGIN pattern in a program.
      652 +
      653 +
      654 +       If an awk program consists of only actions with the pattern BEGIN, and
      655 +       the BEGIN action contains no getline function, awk exits without
      656 +       reading its input when the last statement in the last BEGIN action is
      657 +       executed. If an awk program consists of only actions with the pattern
      658 +       END or only actions with the patterns BEGIN and END, the input is read
      659 +       before the statements in the END actions are executed.
      660 +
      661 +
      662 +   Expression Patterns
      663 +       An expression pattern is evaluated as if it were an expression in a
      664 +       Boolean context. If the result is true, the pattern is considered to
      665 +       match, and the associated action (if any) is executed. If the result is
      666 +       false, the action is not executed.
      667 +
      668 +
      669 +   Pattern Ranges
      670 +       A pattern range consists of two expressions separated by a comma. In
      671 +       this case, the action is performed for all records between a match of
      672 +       the first expression and the following match of the second expression,
      673 +       inclusive. At this point, the pattern range can be repeated starting at
      674 +       input records subsequent to the end of the matched range.
      675 +
      676 +
      677 +   Actions
 147  678         An action is a sequence of statements. A statement can be one of the
 148  679         following:
 149  680  
 150  681           if ( expression ) statement [ else statement ]
 151  682           while ( expression ) statement
 152  683           do statement while ( expression )
 153  684           for ( expression ; expression ; expression ) statement
 154  685           for ( var in array ) statement
      686 +         delete array[subscript] #delete an array element
      687 +         delete array #delete all elements within an array
 155  688           break
 156  689           continue
 157  690           { [ statement ] ... }
 158      -         expression      # commonly variable = expression
      691 +         expression        # commonly variable = expression
 159  692           print [ expression-list ] [ >expression ]
 160  693           printf format [ ,expression-list ] [ >expression ]
 161      -         next            # skip remaining patterns on this input line
 162      -         exit [expr]     # skip the rest of the input; exit status is expr
      694 +         next              # skip remaining patterns on this input line
      695 +         nextfile          # skip remaining patterns on this input file
      696 +         exit [expr] # skip the rest of the input; exit status is expr
      697 +         return [expr]
 163  698  
 164  699  
 165  700  
 166      -       Statements are terminated by semicolons, newlines, or right braces. An
 167      -       empty expression-list stands for the whole input line. Expressions take
 168      -       on string or numeric values as appropriate, and are built using the
 169      -       operators +, -, *, /, %, ^ and concatenation (indicated by a blank).
 170      -       The operators ++, --, +=, -=, *=, /=, %=, ^=, >, >=, <, <=, ==, !=, and
 171      -       ?: are also available in expressions. Variables can be scalars, array
 172      -       elements (denoted x[i]), or fields. Variables are initialized to the
 173      -       null string or zero. Array subscripts can be any string, not
 174      -       necessarily numeric; this allows for a form of associative memory.
 175      -       String constants are quoted (""), with the usual C escapes recognized
 176      -       within.
      701 +       Any single statement can be replaced by a statement list enclosed in
      702 +       braces.  The statements are terminated by newline characters or
      703 +       semicolons, and are executed sequentially in the order that they
      704 +       appear.
 177  705  
 178  706  
 179      -       The print statement prints its arguments on the standard output, or on
 180      -       a file if >expression is present, or on a pipe if '|cmd' is present.
 181      -       The output resulted from the print statement is terminated by the
 182      -       output record separator with each argument separated by the current
 183      -       output field separator. The printf statement formats its expression
 184      -       list according to the format (see printf(3C)).
      707 +       The next statement causes all further processing of the current input
      708 +       record to be abandoned. The behavior is undefined if a next statement
      709 +       appears or is invoked in a BEGIN or END action.
 185  710  
 186      -   Built-in Functions
 187      -       The arithmetic functions are as follows:
 188  711  
      712 +       The nextfile statement is similar to next, but also skips all other
      713 +       records in the current file, and moves on to processing the next input
      714 +       file if available (or exits the program if there are none). (Note that
      715 +       this keyword is not supported by /usr/xpg4/bin/awk.)
      716 +
      717 +
      718 +       The exit statement invokes all END actions in the order in which they
      719 +       occur in the program source and then terminate the program without
      720 +       reading further input. An exit statement inside an END action
      721 +       terminates the program without further execution of END actions.  If an
      722 +       expression is specified in an exit statement, its numeric value is the
      723 +       exit status of awk, unless subsequent errors are encountered or a
      724 +       subsequent exit statement with an expression is executed.
      725 +
      726 +
      727 +   Output Statements
      728 +       Both print and printf statements write to standard output by default.
      729 +       The output is written to the location specified by output_redirection
      730 +       if one is supplied, as follows:
      731 +
      732 +         > expression>> expression| expression
      733 +
      734 +
      735 +
      736 +       In all cases, the expression is evaluated to produce a string that is
      737 +       used as a full pathname to write into (for > or >>) or as a command to
      738 +       be executed (for |). Using the first two forms, if the file of that
      739 +       name is not currently open, it is opened, creating it if necessary and
      740 +       using the first form, truncating the file. The output then is appended
      741 +       to the file.  As long as the file remains open, subsequent calls in
      742 +       which expression evaluates to the same string value simply appends
      743 +       output to the file. The file remains open until the close function,
      744 +       which is called with an expression that evaluates to the same string
      745 +       value.
      746 +
      747 +
      748 +       The third form writes output onto a stream piped to the input of a
      749 +       command. The stream is created if no stream is currently open with the
      750 +       value of expression as its command name.  The stream created is
      751 +       equivalent to one created by a call to the popen(3C) function with the
      752 +       value of expression as the command argument and a value of w as the
      753 +       mode argument.  As long as the stream remains open, subsequent calls in
      754 +       which expression evaluates to the same string value writes output to
      755 +       the existing stream. The stream remains open until the close function
      756 +       is called with an expression that evaluates to the same string value.
      757 +       At that time, the stream is closed as if by a call to the pclose
      758 +       function.
      759 +
      760 +
      761 +       These output statements take a comma-separated list of expression s
      762 +       referred in the grammar by the non-terminal symbols expr_list,
      763 +       print_expr_list or print_expr_list_opt. This list is referred to here
      764 +       as the expression list, and each member is referred to as an expression
      765 +       argument.
      766 +
      767 +
      768 +       The print statement writes the value of each expression argument onto
      769 +       the indicated output stream separated by the current output field
      770 +       separator (see variable OFS above), and terminated by the output record
      771 +       separator (see variable ORS above). All expression arguments is taken
      772 +       as strings, being converted if necessary; with the exception that the
      773 +       printf format in OFMT is used instead of the value in CONVFMT. An empty
      774 +       expression list stands for the whole input record ($0).
      775 +
      776 +
      777 +       The printf statement produces output based on a notation similar to the
      778 +       File Format Notation used to describe file formats in this document
      779 +       Output is produced as specified with the first expression argument as
      780 +       the string format and subsequent expression arguments as the strings
      781 +       arg1 to argn, inclusive, with the following exceptions:
      782 +
      783 +           1.     The format is an actual character string rather than a
      784 +                  graphical representation. Therefore, it cannot contain empty
      785 +                  character positions. The space character in the format
      786 +                  string, in any context other than a flag of a conversion
      787 +                  specification, is treated as an ordinary character that is
      788 +                  copied to the output.
      789 +
      790 +           2.     If the character set contains a Delta character and that
      791 +                  character appears in the format string, it is treated as an
      792 +                  ordinary character that is copied to the output.
      793 +
      794 +           3.     The escape sequences beginning with a backslash character is
      795 +                  treated as sequences of ordinary characters that are copied
      796 +                  to the output. Note that these same sequences is interpreted
      797 +                  lexically by awk when they appear in literal strings, but
      798 +                  they is not treated specially by the printf statement.
      799 +
      800 +           4.     A field width or precision can be specified as the *
      801 +                  character instead of a digit string. In this case the next
      802 +                  argument from the expression list is fetched and its numeric
      803 +                  value taken as the field width or precision.
      804 +
      805 +           5.     The implementation does not precede or follow output from
      806 +                  the d or u conversion specifications with blank characters
      807 +                  not specified by the format string.
      808 +
      809 +           6.     The implementation does not precede output from the o
      810 +                  conversion specification with leading zeros not specified by
      811 +                  the format string.
      812 +
      813 +           7.     For the c conversion specification: if the argument has a
      814 +                  numeric value, the character whose encoding is that value is
      815 +                  output.  If the value is zero or is not the encoding of any
      816 +                  character in the character set, the behavior is undefined.
      817 +                  If the argument does not have a numeric value, the first
      818 +                  character of the string value is output; if the string does
      819 +                  not contain any characters the behavior is undefined.
      820 +
      821 +           8.     For each conversion specification that consumes an argument,
      822 +                  the next expression argument is evaluated. With the
      823 +                  exception of the c conversion, the value is converted to the
      824 +                  appropriate type for the conversion specification.
      825 +
      826 +           9.     If there are insufficient expression arguments to satisfy
      827 +                  all the conversion specifications in the format string, the
      828 +                  behavior is undefined.
      829 +
      830 +           10.    If any character sequence in the format string begins with a
      831 +                  % character, but does not form a valid conversion
      832 +                  specification, the behavior is unspecified.
      833 +
      834 +
      835 +       Both print and printf can output at least {LINE_MAX} bytes.
      836 +
      837 +
      838 +   Functions
      839 +       The awk language has a variety of built-in functions: arithmetic,
      840 +       string, input/output and general.
      841 +
      842 +
      843 +   Arithmetic Functions
      844 +       The arithmetic functions, except for int, are based on the ISO C
      845 +       standard. The behavior is undefined in cases where the ISO C standard
      846 +       specifies that an error be returned or that the behavior is undefined.
      847 +       Although the grammar permits built-in functions to appear with no
      848 +       arguments or parentheses, unless the argument or parentheses are
      849 +       indicated as optional in the following list (by displaying them within
      850 +       the [ ] brackets), such use is undefined.
      851 +
      852 +       atan2(y,x)
      853 +                        Return arctangent of y/x.
      854 +
      855 +
 189  856         cos(x)
 190      -                  Return cosine of x, where x is in radians. (In
 191      -                  /usr/xpg4/bin/awk only. See nawk(1).)
      857 +                        Return cosine of x, where x is in radians.
 192  858  
 193  859  
 194  860         sin(x)
 195      -                  Return sine of x, where x is in radians. (In
 196      -                  /usr/xpg4/bin/awk only. See nawk(1).)
      861 +                        Return sine of x, where x is in radians.
 197  862  
 198  863  
 199  864         exp(x)
 200      -                  Return the exponential function of x.
      865 +                        Return the exponential function of x.
 201  866  
 202  867  
 203  868         log(x)
 204      -                  Return the natural logarithm of x.
      869 +                        Return the natural logarithm of x.
 205  870  
 206  871  
 207  872         sqrt(x)
 208      -                  Return the square root of x.
      873 +                        Return the square root of x.
 209  874  
 210  875  
 211  876         int(x)
 212      -                  Truncate its argument to an integer. It is truncated toward
 213      -                  0 when x > 0.
      877 +                        Truncate its argument to an integer. It is truncated
      878 +                        toward 0 when x > 0.
 214  879  
 215  880  
      881 +       rand()
      882 +                        Return a random number n, such that 0 <= n < 1.
 216  883  
 217      -       The string functions are as follows:
 218  884  
 219      -       index(s, t)
      885 +       srand([expr])
      886 +                        Set the seed value for rand to expr or use the time of
      887 +                        day if expr is omitted. The previous seed value is
      888 +                        returned.
 220  889  
 221      -           Return the position in string s where string t first occurs, or 0
 222      -           if it does not occur at all.
 223  890  
      891 +   String Functions
      892 +       The string functions in the following list shall be supported. Although
      893 +       the grammar permits built-in functions to appear with no arguments or
      894 +       parentheses, unless the argument or parentheses are indicated as
      895 +       optional in the following list (by displaying them within the [ ]
      896 +       brackets), such use is undefined.
 224  897  
 225      -       int(s)
      898 +       gsub(ere,repl[,in])
 226  899  
 227      -           truncates s to an integer value. If s is not specified, $0 is used.
      900 +           Behave like sub (see below), except that it replaces all
      901 +           occurrences of the regular expression (like the ed utility global
      902 +           substitute) in $0 or in the in argument, when specified.
 228  903  
 229  904  
 230      -       length(s)
      905 +       index(s,t)
 231  906  
 232      -           Return the length of its argument taken as a string, or of the
 233      -           whole line if there is no argument.
      907 +           Return the position, in characters, numbering from 1, in string s
      908 +           where string t first occurs, or zero if it does not occur at all.
 234  909  
 235  910  
 236      -       split(s, a, fs)
      911 +       length[([v])]
 237  912  
 238      -           Split the string s into array elements a[1], a[2], ... a[n], and
 239      -           returns n. The separation is done with the regular expression fs or
 240      -           with the field separator FS if fs is not given.
      913 +           Given no argument, this function returns the length of the whole
      914 +           record, $0. If given an array as an argument (and using
      915 +           /usr/bin/awk), then this returns the number of elements it
      916 +           contains. Otherwise, this function interprets the argument as a
      917 +           string (performing any needed conversions) and returns its length
      918 +           in characters.
 241  919  
 242  920  
 243      -       sprintf(fmt, expr, expr,...)
      921 +       match(s,ere)
 244  922  
 245      -           Format the expressions according to the printf(3C) format given by
 246      -           fmt and returns the resulting string.
      923 +           Return the position, in characters, numbering from 1, in string s
      924 +           where the extended regular expression ere occurs, or zero if it
      925 +           does not occur at all. RSTART is set to the starting position
      926 +           (which is the same as the returned value), zero if no match is
      927 +           found; RLENGTH is set to the length of the matched string, -1 if no
      928 +           match is found.
 247  929  
 248  930  
 249      -       substr(s, m, n)
      931 +       split(s,a[,fs])
 250  932  
 251      -           returns the n-character substring of s that begins at position m.
      933 +           Split the string s into array elements a[1], a[2], ..., a[n], and
      934 +           return n. The separation is done with the extended regular
      935 +           expression fs or with the field separator FS if fs is not given.
      936 +           Each array element has a string value when created.  If the string
      937 +           assigned to any array element, with any occurrence of the decimal-
      938 +           point character from the current locale changed to a period
      939 +           character, would be considered a numeric string; the array element
      940 +           also has the numeric value of the numeric string. The effect of a
      941 +           null string as the value of fs is unspecified.
 252  942  
 253  943  
      944 +       sprintf(fmt,expr,expr,...)
 254  945  
 255      -       The input/output function is as follows:
      946 +           Format the expressions according to the printf format given by fmt
      947 +           and return the resulting string.
 256  948  
      949 +
      950 +       sub(ere,repl[,in])
      951 +
      952 +           Substitute the string repl in place of the first instance of the
      953 +           extended regular expression ERE in string in and return the number
      954 +           of substitutions. An ampersand ( & ) appearing in the string repl
      955 +           is replaced by the string from in that matches the regular
      956 +           expression. An ampersand preceded with a backslash ( \ ) is
      957 +           interpreted as the literal ampersand character. An occurrence of
      958 +           two consecutive backslashes is interpreted as just a single literal
      959 +           backslash character.  Any other occurrence of a backslash (for
      960 +           example, preceding any other character) is treated as a literal
      961 +           backslash character. If repl is a string literal, the handling of
      962 +           the ampersand character occurs after any lexical processing,
      963 +           including any lexical backslash escape sequence processing. If in
      964 +           is specified and it is not an lvalue the behavior is undefined. If
      965 +           in is omitted, awk uses the current record ($0) in its place.
      966 +
      967 +
      968 +       substr(s,m[,n])
      969 +
      970 +           Return the at most n-character substring of s that begins at
      971 +           position m, numbering from 1. If n is missing, the length of the
      972 +           substring is limited by the length of the string s.
      973 +
      974 +
      975 +       tolower(s)
      976 +
      977 +           Return a string based on the string s. Each character in s that is
      978 +           an upper-case letter specified to have a tolower mapping by the
      979 +           LC_CTYPE category of the current locale is replaced in the returned
      980 +           string by the lower-case letter specified by the mapping. Other
      981 +           characters in s are unchanged in the returned string.
      982 +
      983 +
      984 +       toupper(s)
      985 +
      986 +           Return a string based on the string s. Each character in s that is
      987 +           a lower-case letter specified to have a toupper mapping by the
      988 +           LC_CTYPE category of the current locale is replaced in the returned
      989 +           string by the upper-case letter specified by the mapping. Other
      990 +           characters in s are unchanged in the returned string.
      991 +
      992 +
      993 +
      994 +       All of the preceding functions that take ERE as a parameter expect a
      995 +       pattern or a string valued expression that is a regular expression as
      996 +       defined below.
      997 +
      998 +
      999 +   Input/Output and General Functions
     1000 +       The input/output and general functions are:
     1001 +
     1002 +       close(expression)
     1003 +                                  Close the file or pipe opened by a print or
     1004 +                                  printf statement or a call to getline with
     1005 +                                  the same string-valued expression. If the
     1006 +                                  close was successful, the function returns
     1007 +                                  0; otherwise, it returns non-zero.
     1008 +
     1009 +
     1010 +       fflush(expression)
     1011 +                                  Flush any buffered output for the file or
     1012 +                                  pipe opened by a print or printf statement
     1013 +                                  or a call to getline with the same string-
     1014 +                                  valued expression. If the flush was
     1015 +                                  successful, the function returns 0;
     1016 +                                  otherwise, it returns EOF. If no arguments
     1017 +                                  or the empty string ("") are given, then all
     1018 +                                  open files will be flushed. (Note that
     1019 +                                  fflush is supported in /usr/bin/awk only.)
     1020 +
     1021 +
     1022 +       expression|getline[var]
     1023 +                                  Read a record of input from a stream piped
     1024 +                                  from the output of a command. The stream is
     1025 +                                  created if no stream is currently open with
     1026 +                                  the value of expression as its command name.
     1027 +                                  The stream created is equivalent to one
     1028 +                                  created by a call to the popen function with
     1029 +                                  the value of expression as the command
     1030 +                                  argument and a value of r as the mode
     1031 +                                  argument. As long as the stream remains
     1032 +                                  open, subsequent calls in which expression
     1033 +                                  evaluates to the same string value reads
     1034 +                                  subsequent records from the file. The stream
     1035 +                                  remains open until the close function is
     1036 +                                  called with an expression that evaluates to
     1037 +                                  the same string value. At that time, the
     1038 +                                  stream is closed as if by a call to the
     1039 +                                  pclose function. If var is missing, $0 and
     1040 +                                  NF is set. Otherwise, var is set.
     1041 +
     1042 +                                  The getline operator can form ambiguous
     1043 +                                  constructs when there are operators that are
     1044 +                                  not in parentheses (including concatenate)
     1045 +                                  to the left of the | (to the beginning of
     1046 +                                  the expression containing getline). In the
     1047 +                                  context of the $ operator, | behaves as if
     1048 +                                  it had a lower precedence than $. The result
     1049 +                                  of evaluating other operators is
     1050 +                                  unspecified, and all such uses of portable
     1051 +                                  applications must be put in parentheses
     1052 +                                  properly.
     1053 +
     1054 +
 257 1055         getline
 258      -                  Set $0 to the next input record from the current input file.
 259      -                  getline returns 1 for successful input, 0 for end of file,
 260      -                  and -1 for an error.
     1056 +                                  Set $0 to the next input record from the
     1057 +                                  current input file. This form of getline
     1058 +                                  sets the NF, NR, and FNR variables.
 261 1059  
 262 1060  
 263      -   Large File Behavior
     1061 +       getline var
     1062 +                                  Set variable var to the next input record
     1063 +                                  from the current input file.  This form of
     1064 +                                  getline sets the FNR and NR variables.
     1065 +
     1066 +
     1067 +       getline [var] < expression
     1068 +                                  Read the next record of input from a named
     1069 +                                  file. The expression is evaluated to produce
     1070 +                                  a string that is used as a full pathname. If
     1071 +                                  the file of that name is not currently open,
     1072 +                                  it is opened. As long as the stream remains
     1073 +                                  open, subsequent calls in which expression
     1074 +                                  evaluates to the same string value reads
     1075 +                                  subsequent records from the file. The file
     1076 +                                  remains open until the close function is
     1077 +                                  called with an expression that evaluates to
     1078 +                                  the same string value. If var is missing, $0
     1079 +                                  and NF is set. Otherwise, var is set.
     1080 +
     1081 +                                  The getline operator can form ambiguous
     1082 +                                  constructs when there are binary operators
     1083 +                                  that are not in parentheses (including
     1084 +                                  concatenate) to the right of the < (up to
     1085 +                                  the end of the expression containing the
     1086 +                                  getline). The result of evaluating such a
     1087 +                                  construct is unspecified, and all such uses
     1088 +                                  of portable applications must be put in
     1089 +                                  parentheses properly.
     1090 +
     1091 +
     1092 +       system(expression)
     1093 +                                  Execute the command given by expression in a
     1094 +                                  manner equivalent to the system(3C) function
     1095 +                                  and return the exit status of the command.
     1096 +
     1097 +
     1098 +
     1099 +       All forms of getline return 1 for successful input, 0 for end of file,
     1100 +       and -1 for an error.
     1101 +
     1102 +
     1103 +       Where strings are used as the name of a file or pipeline, the strings
     1104 +       must be textually identical. The terminology ``same string value''
     1105 +       implies that ``equivalent strings'', even those that differ only by
     1106 +       space characters, represent different files.
     1107 +
     1108 +
     1109 +   User-defined Functions
     1110 +       The awk language also provides user-defined functions. Such functions
     1111 +       can be defined as:
     1112 +
     1113 +         function name(args,...) { statements }
     1114 +
     1115 +
     1116 +
     1117 +       A function can be referred to anywhere in an awk program; in
     1118 +       particular, its use can precede its definition. The scope of a function
     1119 +       is global.
     1120 +
     1121 +
     1122 +       Function arguments can be either scalars or arrays; the behavior is
     1123 +       undefined if an array name is passed as an argument that the function
     1124 +       uses as a scalar, or if a scalar expression is passed as an argument
     1125 +       that the function uses as an array. Function arguments are passed by
     1126 +       value if scalar and by reference if array name. Argument names are
     1127 +       local to the function; all other variable names are global. The same
     1128 +       name is not used as both an argument name and as the name of a function
     1129 +       or a special awk variable. The same name must not be used both as a
     1130 +       variable name with global scope and as the name of a function. The same
     1131 +       name must not be used within the same scope both as a scalar variable
     1132 +       and as an array.
     1133 +
     1134 +
     1135 +       The number of parameters in the function definition need not match the
     1136 +       number of parameters in the function call. Excess formal parameters can
     1137 +       be used as local variables. If fewer arguments are supplied in a
     1138 +       function call than are in the function definition, the extra parameters
     1139 +       that are used in the function body as scalars are initialized with a
     1140 +       string value of the null string and a numeric value of zero, and the
     1141 +       extra parameters that are used in the function body as arrays are
     1142 +       initialized as empty arrays. If more arguments are supplied in a
     1143 +       function call than are in the function definition, the behavior is
     1144 +       undefined.
     1145 +
     1146 +
     1147 +       When invoking a function, no white space can be placed between the
     1148 +       function name and the opening parenthesis. Function calls can be nested
     1149 +       and recursive calls can be made upon functions. Upon return from any
     1150 +       nested or recursive function call, the values of all of the calling
     1151 +       function's parameters are unchanged, except for array parameters passed
     1152 +       by reference. The return statement can be used to return a value. If a
     1153 +       return statement appears outside of a function definition, the behavior
     1154 +       is undefined.
     1155 +
     1156 +
     1157 +       In the function definition, newline characters are optional before the
     1158 +       opening brace and after the closing brace. Function definitions can
     1159 +       appear anywhere in the program where a pattern-action pair is allowed.
     1160 +
     1161 +
     1162 +USAGE
     1163 +       The index, length, match, and substr functions should not be confused
     1164 +       with similar functions in the ISO C standard; the awk versions deal
     1165 +       with characters, while the ISO C standard deals with bytes.
     1166 +
     1167 +
     1168 +       Because the concatenation operation is represented by adjacent
     1169 +       expressions rather than an explicit operator, it is often necessary to
     1170 +       use parentheses to enforce the proper evaluation precedence.
     1171 +
     1172 +
 264 1173         See largefile(5) for the description of the behavior of awk when
 265      -       encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
     1174 +       encountering files greater than or equal to 2 Gbyte (2^31 bytes).
 266 1175  
     1176 +
 267 1177  EXAMPLES
 268      -       Example 1 Printing Lines Longer Than 72 Characters
     1178 +       The awk program specified in the command line is most easily specified
     1179 +       within single-quotes (for example, 'program') for applications using
     1180 +       sh, because awk programs commonly contain characters that are special
     1181 +       to the shell, including double-quotes. In the cases where a awk program
     1182 +       contains single-quote characters, it is usually easiest to specify most
     1183 +       of the program as strings within single-quotes concatenated by the
     1184 +       shell with quoted single-quote characters. For example:
 269 1185  
     1186 +         awk '/'\''/ { print "quote:", $0 }'
 270 1187  
 271      -       The following example is an awk script that can be executed by an awk
 272      -       -f examplescript style command. It prints lines longer than seventy two
 273      -       characters:
 274 1188  
 275 1189  
 276      -         length > 72
     1190 +       prints all lines from the standard input containing a single-quote
     1191 +       character, prefixed with quote:.
 277 1192  
 278 1193  
     1194 +       The following are examples of simple awk programs:
 279 1195  
 280      -       Example 2 Printing Fields in Opposite Order
     1196 +       Example 1 Write to the standard output all input lines for which field
     1197 +       3 is greater than 5:
 281 1198  
     1199 +         $3 > 5
 282 1200  
 283      -       The following example is an awk script that can be executed by an awk
 284      -       -f examplescript style command. It prints the first two fields in
 285      -       opposite order:
 286 1201  
 287 1202  
 288      -         { print $2, $1 }
     1203 +       Example 2 Write every tenth line:
 289 1204  
     1205 +         (NR % 10) == 0
 290 1206  
 291 1207  
 292      -       Example 3 Printing Fields in Opposite Order with the Input Fields
 293      -       Separated
 294 1208  
     1209 +       Example 3 Write any line with a substring matching the regular
     1210 +       expression:
 295 1211  
 296      -       The following example is an awk script that can be executed by an awk
 297      -       -f examplescript style command. It prints the first two input fields in
 298      -       opposite order, separated by a comma, blanks or tabs:
     1212 +         /(G|D)(2[0-9][[:alpha:]]*)/
 299 1213  
 300 1214  
 301      -         BEGIN { FS = ",[ \t]*|[ \t]+" }
 302      -               { print $2, $1 }
 303 1215  
     1216 +       Example 4 Print any line with a substring containing a G or D, followed
     1217 +       by a sequence of digits and characters:
 304 1218  
 305 1219  
 306      -       Example 4 Adding Up the First Column, Printing the Sum and Average
     1220 +       This example uses character classes digit and alpha to match language-
     1221 +       independent digit and alphabetic characters, respectively.
 307 1222  
 308 1223  
 309      -       The following example is an awk script that can be executed by an awk
 310      -       -f examplescript style command.  It adds up the first column, and
 311      -       prints the sum and average:
     1224 +         /(G|D)([[:digit:][:alpha:]]*)/
 312 1225  
 313 1226  
 314      -         { s += $1 }
 315      -         END  { print "sum is", s, " average is", s/NR }
 316 1227  
     1228 +       Example 5 Write any line in which the second field matches the regular
     1229 +       expression and the fourth field does not:
 317 1230  
     1231 +         $2 ~ /xyz/ && $4 !~ /xyz/
 318 1232  
 319      -       Example 5 Printing Fields in Reverse Order
 320 1233  
 321 1234  
 322      -       The following example is an awk script that can be executed by an awk
 323      -       -f examplescript style command. It prints fields in reverse order:
     1235 +       Example 6 Write any line in which the second field contains a
     1236 +       backslash:
 324 1237  
     1238 +         $2 ~ /\\/
 325 1239  
 326      -         { for (i = NF; i > 0; --i) print $i }
 327 1240  
 328 1241  
     1242 +       Example 7 Write any line in which the second field contains a backslash
     1243 +       (alternate method):
 329 1244  
 330      -       Example 6 Printing All lines Between start/stop Pairs
 331 1245  
     1246 +       Notice that backslash escapes are interpreted twice, once in lexical
     1247 +       processing of the string and once in processing the regular expression.
 332 1248  
 333      -       The following example is an awk script that can be executed by an awk
 334      -       -f examplescript style command. It prints all lines between start/stop
 335      -       pairs.
 336 1249  
     1250 +         $2 ~ "\\\\"
 337 1251  
 338      -         /start/, /stop/
 339 1252  
 340 1253  
     1254 +       Example 8 Write the second to the last and the last field in each line,
     1255 +       separating the fields by a colon:
 341 1256  
 342      -       Example 7 Printing All Lines Whose First Field is Different from the
 343      -       Previous One
     1257 +         {OFS=":";print $(NF-1), $NF}
 344 1258  
 345 1259  
 346      -       The following example is an awk script that can be executed by an awk
 347      -       -f examplescript style command. It prints all lines whose first field
 348      -       is different from the previous one.
 349 1260  
     1261 +       Example 9 Write the line number and number of fields in each line:
 350 1262  
     1263 +
     1264 +       The three strings representing the line number, the colon and the
     1265 +       number of fields are concatenated and that string is written to
     1266 +       standard output.
     1267 +
     1268 +
     1269 +         {print NR ":" NF}
     1270 +
     1271 +
     1272 +
     1273 +       Example 10 Write lines longer than 72 characters:
     1274 +
     1275 +         {length($0) > 72}
     1276 +
     1277 +
     1278 +
     1279 +       Example 11 Write first two fields in opposite order separated by the
     1280 +       OFS:
     1281 +
     1282 +         { print $2, $1 }
     1283 +
     1284 +
     1285 +
     1286 +       Example 12 Same, with input fields separated by comma or space and tab
     1287 +       characters, or both:
     1288 +
     1289 +         BEGIN { FS = ",[\t]*|[\t]+" }
     1290 +               { print $2, $1 }
     1291 +
     1292 +
     1293 +
     1294 +       Example 13 Add up first column, print sum and average:
     1295 +
     1296 +         {s += $1 }
     1297 +         END {print "sum is ", s, " average is", s/NR}
     1298 +
     1299 +
     1300 +
     1301 +       Example 14 Write fields in reverse order, one per line (many lines out
     1302 +       for each line in):
     1303 +
     1304 +         { for (i = NF; i > 0; --i) print $i }
     1305 +
     1306 +
     1307 +
     1308 +       Example 15 Write all lines between occurrences of the strings "start"
     1309 +       and "stop":
     1310 +
     1311 +         /start/, /stop/
     1312 +
     1313 +
     1314 +
     1315 +       Example 16 Write all lines whose first field is different from the
     1316 +       previous one:
     1317 +
 351 1318           $1 != prev { print; prev = $1 }
 352 1319  
 353 1320  
 354 1321  
 355      -       Example 8 Printing a File and Filling in Page numbers
     1322 +       Example 17 Simulate the echo command:
 356 1323  
     1324 +         BEGIN  {
     1325 +                for (i = 1; i < ARGC; ++i)
     1326 +                      printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
     1327 +                }
 357 1328  
 358      -       The following example is an awk script that can be executed by an awk
 359      -       -f examplescript style command. It prints a file and fills in page
 360      -       numbers starting at 5:
 361 1329  
 362 1330  
 363      -         /Page/    { $2 = n++; }
 364      -                      { print }
     1331 +       Example 18 Write the path prefixes contained in the PATH environment
     1332 +       variable, one per line:
 365 1333  
     1334 +         BEGIN  {
     1335 +                n = split (ENVIRON["PATH"], path, ":")
     1336 +                for (i = 1; i <= n; ++i)
     1337 +                       print path[i]
     1338 +                }
 366 1339  
 367 1340  
 368      -       Example 9 Printing a File and Numbering Its Pages
 369 1341  
     1342 +       Example 19 Print the file "input", filling in page numbers starting at
     1343 +       5:
 370 1344  
 371      -       Assuming this program is in a file named prog, the following example
 372      -       prints the file input numbering its pages starting at 5:
 373 1345  
     1346 +       If there is a file named input containing page headers of the form
 374 1347  
 375      -         example% awk -f prog n=5 input
 376 1348  
     1349 +         Page#
 377 1350  
 378 1351  
     1352 +
     1353 +       and a file named program that contains
     1354 +
     1355 +
     1356 +         /Page/{ $2 = n++; }
     1357 +         { print }
     1358 +
     1359 +
     1360 +
     1361 +       then the command line
     1362 +
     1363 +
     1364 +         awk -f program n=5 input
     1365 +
     1366 +
     1367 +
     1368 +
     1369 +       prints the file input, filling in page numbers starting at 5.
     1370 +
     1371 +
 379 1372  ENVIRONMENT VARIABLES
 380 1373         See environ(5) for descriptions of the following environment variables
 381      -       that affect the execution of awk: LANG, LC_ALL, LC_COLLATE, LC_CTYPE,
 382      -       LC_MESSAGES, NLSPATH, and PATH.
     1374 +       that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.
 383 1375  
 384 1376         LC_NUMERIC
 385 1377                       Determine the radix character used when interpreting
 386 1378                       numeric input, performing conversions between numeric and
 387 1379                       string values and formatting numeric output.  Regardless
 388 1380                       of locale, the period character (the decimal-point
 389 1381                       character of the POSIX locale) is the decimal-point
 390 1382                       character recognized in processing awk programs
 391 1383                       (including assignments in command-line arguments).
 392 1384  
 393 1385  
 394      -ATTRIBUTES
 395      -       See attributes(5) for descriptions of the following attributes:
     1386 +EXIT STATUS
     1387 +       The following exit values are returned:
 396 1388  
 397      -   /usr/bin/awk
     1389 +       0
     1390 +             All input files were processed successfully.
 398 1391  
 399 1392  
     1393 +       >0
     1394 +             An error occurred.
 400 1395  
 401      -       +---------------+-----------------+
 402      -       |ATTRIBUTE TYPE | ATTRIBUTE VALUE |
 403      -       +---------------+-----------------+
 404      -       |CSI            | Not Enabled     |
 405      -       +---------------+-----------------+
 406 1396  
 407      -   /usr/xpg4/bin/awk
 408 1397  
     1398 +       The exit status can be altered within the program by using an exit
     1399 +       expression.
 409 1400  
 410 1401  
 411      -       +--------------------+-----------------+
 412      -       |  ATTRIBUTE TYPE    | ATTRIBUTE VALUE |
 413      -       +--------------------+-----------------+
 414      -       |CSI                 | Enabled         |
 415      -       +--------------------+-----------------+
 416      -       |Interface Stability | Standard        |
 417      -       +--------------------+-----------------+
 418      -
 419 1402  SEE ALSO
 420      -       egrep(1), grep(1), nawk(1), sed(1), printf(3C), attributes(5),
 421      -       environ(5), largefile(5), standards(5)
     1403 +       ed(1), egrep(1), grep(1), lex(1), oawk(1), sed(1), popen(3C),
     1404 +       printf(3C), system(3C), attributes(5), environ(5), largefile(5),
     1405 +       regex(5), XPG4(5)
 422 1406  
     1407 +
     1408 +       Aho, A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming
     1409 +       Language, Addison-Wesley, 1988.
     1410 +
     1411 +
     1412 +DIAGNOSTICS
     1413 +       If any file operand is specified and the named file cannot be accessed,
     1414 +       awk writes a diagnostic message to standard error and terminate without
     1415 +       any further action.
     1416 +
     1417 +
     1418 +       If the program specified by either the program operand or a progfile
     1419 +       operand is not a valid awk program (as specified in EXTENDED
     1420 +       DESCRIPTION), the behavior is undefined.
     1421 +
     1422 +
 423 1423  NOTES
 424 1424         Input white space is not preserved on output if fields are involved.
 425 1425  
 426 1426  
 427 1427         There are no explicit conversions between numbers and strings. To force
 428      -       an expression to be treated as a number, add 0 to it. To force an
 429      -       expression to be treated as a string, concatenate the null string ("")
 430      -       to it.
     1428 +       an expression to be treated as a number add 0 to it; to force it to be
     1429 +       treated as a string concatenate the null string ("") to it.
 431 1430  
 432 1431  
 433 1432  
 434      -                                 June 22, 2005                          AWK(1)
     1433 +                                April 20, 2020                          AWK(1)
    
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX