Print this page
12482 Have /usr/bin/awk point to /usr/bin/nawk
Reviewed by: Peter Tribble <peter.tribble@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>

*** 4,201 **** NAME awk - pattern scanning and processing language SYNOPSIS ! /usr/bin/awk [-f progfile] [-Fc] [' prog '] [parameters] ! [filename]... ! /usr/xpg4/bin/awk [-FcERE] [-v assignment]... 'program' -f progfile... [argument]... DESCRIPTION ! The /usr/xpg4/bin/awk utility is described on the nawk(1) manual page. - The /usr/bin/awk utility scans each input filename for lines that match - any of a set of patterns specified in prog. The prog string must be - enclosed in single quotes ( a') to protect it from the shell. For each - pattern in prog there can be an associated action performed when a line - of a filename matches the pattern. The set of pattern-action statements - can appear literally as prog or in a file specified with the -f - progfile option. Input files are read in order; if there are no files, - the standard input is read. The file name '-' means the standard input. OPTIONS The following options are supported: -f progfile ! awk uses the set of patterns it reads from progfile. ! -Fc ! Uses the character c as the field separator (FS) ! character. See the discussion of FS below. ! USAGE ! Input Lines ! Each input line is matched against the pattern portion of every ! pattern-action statement; the associated action is performed for each ! matched pattern. Any filename of the form var=value is treated as an ! assignment, not a filename, and is executed at the time it would have ! been opened if it were a filename. Variables assigned in this manner ! are not available inside a BEGIN rule, and are assigned after ! previously specified files have been read. ! An input line is normally made up of fields separated by white spaces. ! (This default can be changed by using the FS built-in variable or the ! -Fc option.) The default is to ignore leading blanks and to separate ! fields by blanks and/or tab characters. However, if FS is assigned a ! value that does not include any of the white spaces, then leading ! blanks are not ignored. The fields are denoted $1, $2, ...; $0 refers ! to the entire line. ! Pattern-action Statements ! A pattern-action statement has the form: pattern { action } - Either pattern or action can be omitted. If there is no action, the - matching line is printed. If there is no pattern, the action is - performed on every input line. Pattern-action statements are separated - by newlines or semicolons. - Patterns are arbitrary Boolean combinations ( !, ||, &&, and - parentheses) of relational expressions and regular expressions. A - relational expression is one of the following: ! expression relop expression ! expression matchop regular_expression - where a relop is any of the six relational operators in C, and a - matchop is either ~ (contains) or !~ (does not contain). An expression - is an arithmetic expression, a relational expression, the special - expression - var in array - or a Boolean combination of these. ! Regular expressions are as in egrep(1). In patterns they must be ! surrounded by slashes. Isolated regular expressions in a pattern apply ! to the entire line. Regular expressions can also occur in relational ! expressions. A pattern can consist of two patterns separated by a ! comma; in this case, the action is performed for all lines between the ! occurrence of the first pattern to the occurrence of the second ! pattern. - The special patterns BEGIN and END can be used to capture control - before the first input line has been read and after the last input line - has been read respectively. These keywords do not combine with any - other patterns. - Built-in Variables - Built-in variables include: FILENAME ! name of the current input file FS ! input field separator regular expression (default blank ! and tab) NF ! number of fields in the current record NR ! ordinal number of the current record OFMT ! output format for numbers (default %.6g) OFS ! output field separator (default blank) ORS ! output record separator (default new-line) RS ! input record separator (default new-line) An action is a sequence of statements. A statement can be one of the following: if ( expression ) statement [ else statement ] while ( expression ) statement do statement while ( expression ) for ( expression ; expression ; expression ) statement for ( var in array ) statement break continue { [ statement ] ... } expression # commonly variable = expression print [ expression-list ] [ >expression ] printf format [ ,expression-list ] [ >expression ] next # skip remaining patterns on this input line exit [expr] # skip the rest of the input; exit status is expr ! Statements are terminated by semicolons, newlines, or right braces. An ! empty expression-list stands for the whole input line. Expressions take ! on string or numeric values as appropriate, and are built using the ! operators +, -, *, /, %, ^ and concatenation (indicated by a blank). ! The operators ++, --, +=, -=, *=, /=, %=, ^=, >, >=, <, <=, ==, !=, and ! ?: are also available in expressions. Variables can be scalars, array ! elements (denoted x[i]), or fields. Variables are initialized to the ! null string or zero. Array subscripts can be any string, not ! necessarily numeric; this allows for a form of associative memory. ! String constants are quoted (""), with the usual C escapes recognized ! within. ! The print statement prints its arguments on the standard output, or on ! a file if >expression is present, or on a pipe if '|cmd' is present. ! The output resulted from the print statement is terminated by the ! output record separator with each argument separated by the current ! output field separator. The printf statement formats its expression ! list according to the format (see printf(3C)). - Built-in Functions - The arithmetic functions are as follows: cos(x) ! Return cosine of x, where x is in radians. (In ! /usr/xpg4/bin/awk only. See nawk(1).) sin(x) ! Return sine of x, where x is in radians. (In ! /usr/xpg4/bin/awk only. See nawk(1).) exp(x) Return the exponential function of x. --- 4,866 ---- NAME awk - pattern scanning and processing language SYNOPSIS ! /usr/bin/awk [-F ERE] [-v assignment] 'program' | -f progfile... ! [argument]... ! /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile... [argument]... + /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile... + [argument]... + + DESCRIPTION ! NOTE: The nawk command is now the system default awk for illumos. + The /usr/bin/awk and /usr/xpg4/bin/awk utilities execute programs + written in the awk programming language, which is specialized for + textual data manipulation. A awk program is a sequence of patterns and + corresponding actions. The string specifying program must be enclosed + in single quotes (') to protect it from interpretation by the shell. + The sequence of pattern - action statements can be specified in the + command line as program or in one, or more, file(s) specified by the + -fprogfile option. When input is read that matches a pattern, the + action associated with the pattern is performed. + Input is interpreted as a sequence of records. By default, a record is + a line, but this can be changed by using the RS built-in variable. Each + record of input is matched to each pattern in the program. For each + pattern matched, the associated action is executed. + + + The awk utility interprets each input record as a sequence of fields + where, by default, a field is a string of non-blank characters. This + default white-space field delimiter (blanks and/or tabs) can be changed + by using the FS built-in variable or the -FERE option. The awk utility + denotes the first field in a record $1, the second $2, and so forth. + The symbol $0 refers to the entire record; setting any other field + causes the reevaluation of $0. Assigning to $0 resets the values of all + fields and the NF built-in variable. + + OPTIONS The following options are supported: + -F ERE + Define the input field separator to be the extended + regular expression ERE, before any input is read (can + be a character). + + -f progfile ! Specifies the pathname of the file progfile containing ! a awk program. If multiple instances of this option ! are specified, the concatenation of the files ! specified as progfile in the order specified is the ! awk program. The awk program can alternatively be ! specified in the command line as a single argument. ! -v assignment ! The assignment argument must be in the same form as an ! assignment operand. The assignment is of the form ! var=value, where var is the name of one of the ! variables described below. The specified assignment ! occurs before executing the awk program, including the ! actions associated with BEGIN patterns (if any). ! Multiple occurrences of this option can be specified. ! -safe ! When passed to awk, this flag will prevent the program ! from opening new files or running child processes. The ! ENVIRON array will also not be initialized. ! OPERANDS ! The following operands are supported: ! program ! If no -f option is specified, the first operand to awk is ! the text of the awk program. The application supplies the ! program operand as a single argument to awk. If the text ! does not end in a newline character, awk interprets the ! text as if it did. + + argument + Either of the following two types of argument can be + intermixed: + + file + A pathname of a file that contains the input + to be read, which is matched against the set + of patterns in the program. If no file + operands are specified, or if a file operand + is -, the standard input is used. + + + assignment + An operand that begins with an underscore or + alphabetic character from the portable + character set, followed by a sequence of + underscores, digits and alphabetics from the + portable character set, followed by the = + character specifies a variable assignment + rather than a pathname. The characters before + the = represent the name of a awk variable. + If that name is a awk reserved word, the + behavior is undefined. The characters + following the equal sign is interpreted as if + they appeared in the awk program preceded and + followed by a double-quote (") character, as + a STRING token , except that if the last + character is an unescaped backslash, it is + interpreted as a literal backslash rather + than as the first character of the sequence + \.. The variable is assigned the value of + that STRING token. If the value is considered + a numericstring, the variable is assigned its + numeric value. Each such variable assignment + is performed just before the processing of + the following file, if any. Thus, an + assignment before the first file argument is + executed after the BEGIN actions (if any), + while an assignment after the last file + argument is executed before the END actions + (if any). If there are no file arguments, + assignments are executed before processing + the standard input. + + + + INPUT FILES + Input files to the awk program from any of the following sources: + + o any file operands or their equivalents, achieved by + modifying the awk variables ARGV and ARGC + + o standard input in the absence of any file operands + + o arguments to the getline function + + + must be text files. Whether the variable RS is set to a value other + than a newline character or not, for these files, implementations + support records terminated with the specified separator up to + {LINE_MAX} bytes and can support longer records. + + + If -f progfile is specified, the files named by each of the progfile + option-arguments must be text files containing an awk program. + + + The standard input are used only if no file operands are specified, or + if a file operand is -. + + + EXTENDED DESCRIPTION + A awk program is composed of pairs of the form: + pattern { action } + Either the pattern or the action (including the enclosing brace + characters) can be omitted. Pattern-action statements are separated by + a semicolon or by a newline. + A missing pattern matches any record of input, and a missing action is + equivalent to an action that writes the matched record of input to + standard output. ! Execution of the awk program starts by first executing the actions ! associated with all BEGIN patterns in the order they occur in the ! program. Then each file operand (or standard input if no files were ! specified) is processed by reading data from the file until a record ! separator is seen (a newline character by default), splitting the ! current record into fields using the current value of FS, evaluating ! each pattern in the program in the order of occurrence, and executing ! the action associated with each pattern that matches the current ! record. The action for a matching pattern is executed before evaluating ! subsequent patterns. Last, the actions associated with all END patterns ! is executed in the order they occur in the program. + Expressions in awk + Expressions describe computations used in patterns and actions. In the + following table, valid expression operations are given in groups from + highest precedence first to lowest precedence last, with equal- + precedence operators grouped between horizontal lines. In expression + evaluation, where the grammar is formally ambiguous, higher precedence + operators are evaluated before lower precedence operators. In this + table expr, expr1, expr2, and expr3 represent any expression, while + lvalue represents any entity that can be assigned to (that is, on the + left side of an assignment operator). + Syntax Name Type of Result Associativity + ------------------------------------------------------------------------------- + ( expr ) Grouping type of expr n/a + ------------------------------------------------------------------------------- + $expr Field reference string n/a + ------------------------------------------------------------------------------- + ++ lvalue Pre-increment numeric n/a + -- lvalue Pre-decrement numeric n/a + lvalue ++ Post-increment numeric n/a + lvalue -- Post-decrement numeric n/a + ------------------------------------------------------------------------------- + expr ^ expr Exponentiation numeric right + ------------------------------------------------------------------------------- + ! expr Logical not numeric n/a + + expr Unary plus numeric n/a + - expr Unary minus numeric n/a + ------------------------------------------------------------------------------- + expr * expr Multiplication numeric left + expr / expr Division numeric left + expr % expr Modulus numeric left + ------------------------------------------------------------------------------- + expr + expr Addition numeric left + expr - expr Subtraction numeric left + ------------------------------------------------------------------------------- + expr expr String concatenation string left + ------------------------------------------------------------------------------- + expr < expr Less than numeric none + expr <= expr Less than or equal to numeric none + expr != expr Not equal to numeric none + expr == expr Equal to numeric none + expr > expr Greater than numeric none + expr >= expr Greater than or equal to numeric none + ------------------------------------------------------------------------------- + expr ~ expr ERE match numeric none + expr !~ expr ERE non-match numeric none + ------------------------------------------------------------------------------- + expr in array Array membership numeric left + ( index ) in Multi-dimension array numeric left + array membership + ------------------------------------------------------------------------------- + expr && expr Logical AND numeric left + ------------------------------------------------------------------------------- + expr || expr Logical OR numeric left + ------------------------------------------------------------------------------- + expr1 ? expr2 Conditional expression type of selected right + : expr3 expr2 or expr3 + ------------------------------------------------------------------------------- + lvalue ^= expr Exponentiation numeric right + assignment + lvalue %= expr Modulus assignment numeric right + lvalue *= expr Multiplication numeric right + assignment + lvalue /= expr Division assignment numeric right + lvalue += expr Addition assignment numeric right + lvalue -= expr Subtraction assignment numeric right + lvalue = expr Assignment type of expr right ! Each expression has either a string value, a numeric value or both. ! Except as stated for specific contexts, the value of an expression is ! implicitly converted to the type needed for the context in which it is ! used. A string value is converted to a numeric value by the equivalent ! of the following calls: + setlocale(LC_NUMERIC, ""); + numeric_value = atof(string_value); + A numeric value that is exactly equal to the value of an integer is + converted to a string by the equivalent of a call to the sprintf + function with the string %d as the fmt argument and the numeric value + being converted as the first and only expr argument. Any other numeric + value is converted to a string by the equivalent of a call to the + sprintf function with the value of the variable CONVFMT as the fmt + argument and the numeric value being converted as the first and only + expr argument. + + + A string value is considered to be a numeric string in the following + case: + + 1. Any leading and trailing blank characters is ignored. + + 2. If the first unignored character is a + or -, it is ignored. + + 3. If the remaining unignored characters would be lexically + recognized as a NUMBER token, the string is considered a + numeric string. + + + If a - character is ignored in the above steps, the numeric value of + the numeric string is the negation of the numeric value of the + recognized NUMBER token. Otherwise the numeric value of the numeric + string is the numeric value of the recognized NUMBER token. Whether or + not a string is a numeric string is relevant only in contexts where + that term is used in this section. + + + When an expression is used in a Boolean context, if it has a numeric + value, a value of zero is treated as false and any other value is + treated as true. Otherwise, a string value of the null string is + treated as false and any other value is treated as true. A Boolean + context is one of the following: + + o the first subexpression of a conditional expression. + + o an expression operated on by logical NOT, logical AND, or + logical OR. + + o the second expression of a for statement. + + o the expression of an if statement. + + o the expression of the while clause in either a while or do + ... while statement. + + o an expression used as a pattern (as in Overall Program + Structure). + + + The awk language supplies arrays that are used for storing numbers or + strings. Arrays need not be declared. They are initially empty, and + their sizes changes dynamically. The subscripts, or element + identifiers, are strings, providing a type of associative array + capability. An array name followed by a subscript within square + brackets can be used as an lvalue and as an expression, as described in + the grammar. Unsubscripted array names are used in only the following + contexts: + + o a parameter in a function definition or function call. + + o the NAME token following any use of the keyword in. + + + A valid array index consists of one or more comma-separated + expressions, similar to the way in which multi-dimensional arrays are + indexed in some programming languages. Because awk arrays are really + one-dimensional, such a comma-separated list is converted to a single + string by concatenating the string values of the separate expressions, + each separated from the other by the value of the SUBSEP variable. + + + Thus, the following two index operations are equivalent: + + var[expr1, expr2, ... exprn] + var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn] + + + + A multi-dimensioned index used with the in operator must be put in + parentheses. The in operator, which tests for the existence of a + particular array element, does not create the element if it does not + exist. Any other reference to a non-existent array element + automatically creates it. + + + Variables and Special Variables + Variables can be used in an awk program by referencing them. With the + exception of function parameters, they are not explicitly declared. + Uninitialized scalar variables and array elements have both a numeric + value of zero and a string value of the empty string. + + + Field variables are designated by a $ followed by a number or numerical + expression. The effect of the field number expression evaluating to + anything other than a non-negative integer is unspecified. + Uninitialized variables or string values need not be converted to + numeric values in this context. New field variables are created by + assigning a value to them. References to non-existent fields (that is, + fields after $NF) produce the null string. However, assigning to a non- + existent field (for example, $(NF+2) = 5) increases the value of NF, + create any intervening fields with the null string as their values and + cause the value of $0 to be recomputed, with the fields being separated + by the value of OFS. Each field variable has a string value when + created. If the string, with any occurrence of the decimal-point + character from the current locale changed to a period character, is + considered a numeric string (see Expressions in awk above), the field + variable also has the numeric value of the numeric string. + + + /usr/bin/awk, /usr/xpg4/bin/awk + awk sets the following special variables that are supported by both + /usr/bin/awk and /usr/xpg4/bin/awk: + + ARGC + The number of elements in the ARGV array. + + + ARGV + An array of command line arguments, excluding options and + the program argument, numbered from zero to ARGC-1. + + The arguments in ARGV can be modified or added to; ARGC can + be altered. As each input file ends, awk treats the next + non-null element of ARGV, up to the current value of + ARGC-1, inclusive, as the name of the next input file. + Setting an element of ARGV to null means that it is not + treated as an input file. The name - indicates the standard + input. If an argument matches the format of an assignment + operand, this argument is treated as an assignment rather + than a file argument. + + + CONVFMT + The printf format for converting numbers to strings (except + for output statements, where OFMT is used). The default is + %.6g. + + + ENVIRON + The variable ENVIRON is an array representing the value of + the environment. The indices of the array are strings + consisting of the names of the environment variables, and + the value of each array element is a string consisting of + the value of that variable. If the value of an environment + variable is considered a numeric string, the array element + also has its numeric value. + + In all cases where awk behavior is affected by environment + variables (including the environment of any commands that + awk executes via the system function or via pipeline + redirections with the print statement, the printf + statement, or the getline function), the environment used + is the environment at the time awk began executing. + + FILENAME ! A pathname of the current input file. Inside a BEGIN action ! the value is undefined. Inside an END action the value is ! the name of the last input file processed. + FNR + The ordinal number of the current record in the current + file. Inside a BEGIN action the value is zero. Inside an + END action the value is the number of the last record + processed in the last file processed. + + FS ! Input field separator regular expression; a space character ! by default. NF ! The number of fields in the current record. Inside a BEGIN ! action, the use of NF is undefined unless a getline ! function without a var argument is executed previously. ! Inside an END action, NF retains the value it had for the ! last record read, unless a subsequent, redirected, getline ! function without a var argument is performed prior to ! entering the END action. NR ! The ordinal number of the current record from the start of ! input. Inside a BEGIN action the value is zero. Inside an ! END action the value is the number of the last record ! processed. OFMT ! The printf format for converting numbers to strings in ! output statements "%.6g" by default. The result of the ! conversion is unspecified if the value of OFMT is not a ! floating-point format specification. OFS ! The print statement output field separator; a space ! character by default. ORS ! The print output record separator; a newline character by ! default. + RLENGTH + The length of the string matched by the match function. + + RS ! The first character of the string value of RS is the input ! record separator; a newline character by default. If RS ! contains more than one character, the results are ! unspecified. If RS is null, then records are separated by ! sequences of one or more blank lines. Leading or trailing ! blank lines do not produce empty records at the beginning ! or end of input, and the field separator is always newline, ! no matter what the value of FS. + RSTART + The starting position of the string matched by the match + function, numbering from 1. This is always equivalent to + the return value of the match function. + + SUBSEP + The subscript separator string for multi-dimensional + arrays. The default value is \034. + + + /usr/bin/awk + The following variable is supported for /usr/bin/awk only: + + RT + The record terminator for the most recent record read. For + most records this will be the same value as RS. At the end + of a file with no trailing separator value, though, this + will be set to the empty string (""). + + + Regular Expressions + The awk utility makes use of the extended regular expression notation + (see regex(5)) except that it allows the use of C-language conventions + to escape special characters within the EREs, namely \\, \a, \b, \f, + \n, \r, \t, \v, and those specified in the following table. These + escape sequences are recognized both inside and outside bracket + expressions. Note that records need not be separated by newline + characters and string constants can contain newline characters, so even + the \n sequence is valid in awk EREs. Using a slash character within + the regular expression requires escaping as shown in the table below: + + + + + Escape Sequence Description Meaning + ---------------------------------------------------------------------- + \" Backslash quotation-mark Quotation-mark character + ---------------------------------------------------------------------- + \/ Backslash slash Slash character + ---------------------------------------------------------------------- + \ddd A backslash character The character encoded by + followed by the longest the one-, two- or + sequence of one, two, or three-digit octal + three octal-digit integer. Multi-byte + characters (01234567). characters require + If all of the digits are multiple, concatenated + 0, (that is, escape sequences, + representation of the including the leading \ + NULL character), the for each byte. + behavior is undefined. + ---------------------------------------------------------------------- + \c A backslash character Undefined + followed by any + character not described + in this table or special + characters (\\, \a, \b, + \f, \n, \r, \t, \v). + + + + A regular expression can be matched against a specific field or string + by using one of the two regular expression matching operators, ~ and + !~. These operators interpret their right-hand operand as a regular + expression and their left-hand operand as a string. If the regular + expression matches the string, the ~ expression evaluates to the value + 1, and the !~ expression evaluates to the value 0. If the regular + expression does not match the string, the ~ expression evaluates to the + value 0, and the !~ expression evaluates to the value 1. If the right- + hand operand is any expression other than the lexical token ERE, the + string value of the expression is interpreted as an extended regular + expression, including the escape conventions described above. Notice + that these same escape conventions also are applied in the determining + the value of a string literal (the lexical token STRING), and is + applied a second time when a string literal is used in this context. + + + When an ERE token appears as an expression in any context other than as + the right-hand of the ~ or !~ operator or as one of the built-in + function arguments described below, the value of the resulting + expression is the equivalent of: + + $0 ~ /ere/ + + + + The ere argument to the gsub, match, sub functions, and the fs argument + to the split function (see String Functions) is interpreted as extended + regular expressions. These can be either ERE tokens or arbitrary + expressions, and are interpreted in the same manner as the right-hand + side of the ~ or !~ operator. + + + An extended regular expression can be used to separate fields by using + the -F ERE option or by assigning a string containing the expression to + the built-in variable FS. The default value of the FS variable is a + single space character. The following describes FS behavior: + + 1. If FS is a single character: + + o If FS is the space character, skip leading and trailing + blank characters; fields are delimited by sets of one or + more blank characters. + + o Otherwise, if FS is any other character c, fields are + delimited by each single occurrence of c. + + 2. Otherwise, the string value of FS is considered to be an + extended regular expression. Each occurrence of a sequence + matching the extended regular expression delimits fields. + + + Except in the gsub, match, split, and sub built-in functions, regular + expression matching is based on input records. That is, record + separator characters (the first character of the value of the variable + RS, a newline character by default) cannot be embedded in the + expression, and no expression matches the record separator character. + If the record separator is not a newline character, newline characters + embedded in the expression can be matched. In those four built-in + functions, regular expression matching are based on text strings. So, + any character (including the newline character and the record + separator) can be embedded in the pattern and an appropriate pattern + matches any character. However, in all awk regular expression matching, + the use of one or more NULL characters in the pattern, input record or + text string produces undefined results. + + + Patterns + A pattern is any valid expression, a range specified by two expressions + separated by comma, or one of the two special patterns BEGIN or END. + + + Special Patterns + The awk utility recognizes two special patterns, BEGIN and END. Each + BEGIN pattern is matched once and its associated action executed before + the first record of input is read (except possibly by use of the + getline function in a prior BEGIN action) and before command line + assignment is done. Each END pattern is matched once and its associated + action executed after the last record of input has been read. These two + patterns have associated actions. + + + BEGIN and END do not combine with other patterns. Multiple BEGIN and + END patterns are allowed. The actions associated with the BEGIN + patterns are executed in the order specified in the program, as are the + END actions. An END pattern can precede a BEGIN pattern in a program. + + + If an awk program consists of only actions with the pattern BEGIN, and + the BEGIN action contains no getline function, awk exits without + reading its input when the last statement in the last BEGIN action is + executed. If an awk program consists of only actions with the pattern + END or only actions with the patterns BEGIN and END, the input is read + before the statements in the END actions are executed. + + + Expression Patterns + An expression pattern is evaluated as if it were an expression in a + Boolean context. If the result is true, the pattern is considered to + match, and the associated action (if any) is executed. If the result is + false, the action is not executed. + + + Pattern Ranges + A pattern range consists of two expressions separated by a comma. In + this case, the action is performed for all records between a match of + the first expression and the following match of the second expression, + inclusive. At this point, the pattern range can be repeated starting at + input records subsequent to the end of the matched range. + + + Actions An action is a sequence of statements. A statement can be one of the following: if ( expression ) statement [ else statement ] while ( expression ) statement do statement while ( expression ) for ( expression ; expression ; expression ) statement for ( var in array ) statement + delete array[subscript] #delete an array element + delete array #delete all elements within an array break continue { [ statement ] ... } expression # commonly variable = expression print [ expression-list ] [ >expression ] printf format [ ,expression-list ] [ >expression ] next # skip remaining patterns on this input line + nextfile # skip remaining patterns on this input file exit [expr] # skip the rest of the input; exit status is expr + return [expr] ! Any single statement can be replaced by a statement list enclosed in ! braces. The statements are terminated by newline characters or ! semicolons, and are executed sequentially in the order that they ! appear. ! The next statement causes all further processing of the current input ! record to be abandoned. The behavior is undefined if a next statement ! appears or is invoked in a BEGIN or END action. + The nextfile statement is similar to next, but also skips all other + records in the current file, and moves on to processing the next input + file if available (or exits the program if there are none). (Note that + this keyword is not supported by /usr/xpg4/bin/awk.) + + + The exit statement invokes all END actions in the order in which they + occur in the program source and then terminate the program without + reading further input. An exit statement inside an END action + terminates the program without further execution of END actions. If an + expression is specified in an exit statement, its numeric value is the + exit status of awk, unless subsequent errors are encountered or a + subsequent exit statement with an expression is executed. + + + Output Statements + Both print and printf statements write to standard output by default. + The output is written to the location specified by output_redirection + if one is supplied, as follows: + + > expression>> expression| expression + + + + In all cases, the expression is evaluated to produce a string that is + used as a full pathname to write into (for > or >>) or as a command to + be executed (for |). Using the first two forms, if the file of that + name is not currently open, it is opened, creating it if necessary and + using the first form, truncating the file. The output then is appended + to the file. As long as the file remains open, subsequent calls in + which expression evaluates to the same string value simply appends + output to the file. The file remains open until the close function, + which is called with an expression that evaluates to the same string + value. + + + The third form writes output onto a stream piped to the input of a + command. The stream is created if no stream is currently open with the + value of expression as its command name. The stream created is + equivalent to one created by a call to the popen(3C) function with the + value of expression as the command argument and a value of w as the + mode argument. As long as the stream remains open, subsequent calls in + which expression evaluates to the same string value writes output to + the existing stream. The stream remains open until the close function + is called with an expression that evaluates to the same string value. + At that time, the stream is closed as if by a call to the pclose + function. + + + These output statements take a comma-separated list of expression s + referred in the grammar by the non-terminal symbols expr_list, + print_expr_list or print_expr_list_opt. This list is referred to here + as the expression list, and each member is referred to as an expression + argument. + + + The print statement writes the value of each expression argument onto + the indicated output stream separated by the current output field + separator (see variable OFS above), and terminated by the output record + separator (see variable ORS above). All expression arguments is taken + as strings, being converted if necessary; with the exception that the + printf format in OFMT is used instead of the value in CONVFMT. An empty + expression list stands for the whole input record ($0). + + + The printf statement produces output based on a notation similar to the + File Format Notation used to describe file formats in this document + Output is produced as specified with the first expression argument as + the string format and subsequent expression arguments as the strings + arg1 to argn, inclusive, with the following exceptions: + + 1. The format is an actual character string rather than a + graphical representation. Therefore, it cannot contain empty + character positions. The space character in the format + string, in any context other than a flag of a conversion + specification, is treated as an ordinary character that is + copied to the output. + + 2. If the character set contains a Delta character and that + character appears in the format string, it is treated as an + ordinary character that is copied to the output. + + 3. The escape sequences beginning with a backslash character is + treated as sequences of ordinary characters that are copied + to the output. Note that these same sequences is interpreted + lexically by awk when they appear in literal strings, but + they is not treated specially by the printf statement. + + 4. A field width or precision can be specified as the * + character instead of a digit string. In this case the next + argument from the expression list is fetched and its numeric + value taken as the field width or precision. + + 5. The implementation does not precede or follow output from + the d or u conversion specifications with blank characters + not specified by the format string. + + 6. The implementation does not precede output from the o + conversion specification with leading zeros not specified by + the format string. + + 7. For the c conversion specification: if the argument has a + numeric value, the character whose encoding is that value is + output. If the value is zero or is not the encoding of any + character in the character set, the behavior is undefined. + If the argument does not have a numeric value, the first + character of the string value is output; if the string does + not contain any characters the behavior is undefined. + + 8. For each conversion specification that consumes an argument, + the next expression argument is evaluated. With the + exception of the c conversion, the value is converted to the + appropriate type for the conversion specification. + + 9. If there are insufficient expression arguments to satisfy + all the conversion specifications in the format string, the + behavior is undefined. + + 10. If any character sequence in the format string begins with a + % character, but does not form a valid conversion + specification, the behavior is unspecified. + + + Both print and printf can output at least {LINE_MAX} bytes. + + + Functions + The awk language has a variety of built-in functions: arithmetic, + string, input/output and general. + + + Arithmetic Functions + The arithmetic functions, except for int, are based on the ISO C + standard. The behavior is undefined in cases where the ISO C standard + specifies that an error be returned or that the behavior is undefined. + Although the grammar permits built-in functions to appear with no + arguments or parentheses, unless the argument or parentheses are + indicated as optional in the following list (by displaying them within + the [ ] brackets), such use is undefined. + + atan2(y,x) + Return arctangent of y/x. + + cos(x) ! Return cosine of x, where x is in radians. sin(x) ! Return sine of x, where x is in radians. exp(x) Return the exponential function of x.
*** 207,387 **** sqrt(x) Return the square root of x. int(x) ! Truncate its argument to an integer. It is truncated toward ! 0 when x > 0. - The string functions are as follows: ! index(s, t) - Return the position in string s where string t first occurs, or 0 - if it does not occur at all. ! int(s) ! truncates s to an integer value. If s is not specified, $0 is used. ! length(s) ! Return the length of its argument taken as a string, or of the ! whole line if there is no argument. ! split(s, a, fs) ! Split the string s into array elements a[1], a[2], ... a[n], and ! returns n. The separation is done with the regular expression fs or ! with the field separator FS if fs is not given. ! sprintf(fmt, expr, expr,...) ! Format the expressions according to the printf(3C) format given by ! fmt and returns the resulting string. ! substr(s, m, n) ! returns the n-character substring of s that begins at position m. ! The input/output function is as follows: getline ! Set $0 to the next input record from the current input file. ! getline returns 1 for successful input, 0 for end of file, and -1 for an error. ! Large File Behavior See largefile(5) for the description of the behavior of awk when ! encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES ! Example 1 Printing Lines Longer Than 72 Characters - The following example is an awk script that can be executed by an awk - -f examplescript style command. It prints lines longer than seventy two - characters: ! length > 72 ! Example 2 Printing Fields in Opposite Order - The following example is an awk script that can be executed by an awk - -f examplescript style command. It prints the first two fields in - opposite order: ! { print $2, $1 } - Example 3 Printing Fields in Opposite Order with the Input Fields - Separated ! The following example is an awk script that can be executed by an awk ! -f examplescript style command. It prints the first two input fields in ! opposite order, separated by a comma, blanks or tabs: - BEGIN { FS = ",[ \t]*|[ \t]+" } - { print $2, $1 } ! Example 4 Adding Up the First Column, Printing the Sum and Average ! The following example is an awk script that can be executed by an awk ! -f examplescript style command. It adds up the first column, and ! prints the sum and average: - { s += $1 } - END { print "sum is", s, " average is", s/NR } - Example 5 Printing Fields in Reverse Order ! The following example is an awk script that can be executed by an awk ! -f examplescript style command. It prints fields in reverse order: - { for (i = NF; i > 0; --i) print $i } - Example 6 Printing All lines Between start/stop Pairs - The following example is an awk script that can be executed by an awk - -f examplescript style command. It prints all lines between start/stop - pairs. - /start/, /stop/ ! Example 7 Printing All Lines Whose First Field is Different from the ! Previous One - The following example is an awk script that can be executed by an awk - -f examplescript style command. It prints all lines whose first field - is different from the previous one. $1 != prev { print; prev = $1 } ! Example 8 Printing a File and Filling in Page numbers - The following example is an awk script that can be executed by an awk - -f examplescript style command. It prints a file and fills in page - numbers starting at 5: ! /Page/ { $2 = n++; } { print } ! Example 9 Printing a File and Numbering Its Pages ! Assuming this program is in a file named prog, the following example ! prints the file input numbering its pages starting at 5: - example% awk -f prog n=5 input ENVIRONMENT VARIABLES See environ(5) for descriptions of the following environment variables ! that affect the execution of awk: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, ! LC_MESSAGES, NLSPATH, and PATH. LC_NUMERIC Determine the radix character used when interpreting numeric input, performing conversions between numeric and string values and formatting numeric output. Regardless --- 872,1379 ---- sqrt(x) Return the square root of x. int(x) ! Truncate its argument to an integer. It is truncated ! toward 0 when x > 0. + rand() + Return a random number n, such that 0 <= n < 1. ! srand([expr]) ! Set the seed value for rand to expr or use the time of ! day if expr is omitted. The previous seed value is ! returned. + String Functions + The string functions in the following list shall be supported. Although + the grammar permits built-in functions to appear with no arguments or + parentheses, unless the argument or parentheses are indicated as + optional in the following list (by displaying them within the [ ] + brackets), such use is undefined. ! gsub(ere,repl[,in]) ! Behave like sub (see below), except that it replaces all ! occurrences of the regular expression (like the ed utility global ! substitute) in $0 or in the in argument, when specified. ! index(s,t) ! Return the position, in characters, numbering from 1, in string s ! where string t first occurs, or zero if it does not occur at all. ! length[([v])] ! Given no argument, this function returns the length of the whole ! record, $0. If given an array as an argument (and using ! /usr/bin/awk), then this returns the number of elements it ! contains. Otherwise, this function interprets the argument as a ! string (performing any needed conversions) and returns its length ! in characters. ! match(s,ere) ! Return the position, in characters, numbering from 1, in string s ! where the extended regular expression ere occurs, or zero if it ! does not occur at all. RSTART is set to the starting position ! (which is the same as the returned value), zero if no match is ! found; RLENGTH is set to the length of the matched string, -1 if no ! match is found. ! split(s,a[,fs]) ! Split the string s into array elements a[1], a[2], ..., a[n], and ! return n. The separation is done with the extended regular ! expression fs or with the field separator FS if fs is not given. ! Each array element has a string value when created. If the string ! assigned to any array element, with any occurrence of the decimal- ! point character from the current locale changed to a period ! character, would be considered a numeric string; the array element ! also has the numeric value of the numeric string. The effect of a ! null string as the value of fs is unspecified. + sprintf(fmt,expr,expr,...) ! Format the expressions according to the printf format given by fmt ! and return the resulting string. + + sub(ere,repl[,in]) + + Substitute the string repl in place of the first instance of the + extended regular expression ERE in string in and return the number + of substitutions. An ampersand ( & ) appearing in the string repl + is replaced by the string from in that matches the regular + expression. An ampersand preceded with a backslash ( \ ) is + interpreted as the literal ampersand character. An occurrence of + two consecutive backslashes is interpreted as just a single literal + backslash character. Any other occurrence of a backslash (for + example, preceding any other character) is treated as a literal + backslash character. If repl is a string literal, the handling of + the ampersand character occurs after any lexical processing, + including any lexical backslash escape sequence processing. If in + is specified and it is not an lvalue the behavior is undefined. If + in is omitted, awk uses the current record ($0) in its place. + + + substr(s,m[,n]) + + Return the at most n-character substring of s that begins at + position m, numbering from 1. If n is missing, the length of the + substring is limited by the length of the string s. + + + tolower(s) + + Return a string based on the string s. Each character in s that is + an upper-case letter specified to have a tolower mapping by the + LC_CTYPE category of the current locale is replaced in the returned + string by the lower-case letter specified by the mapping. Other + characters in s are unchanged in the returned string. + + + toupper(s) + + Return a string based on the string s. Each character in s that is + a lower-case letter specified to have a toupper mapping by the + LC_CTYPE category of the current locale is replaced in the returned + string by the upper-case letter specified by the mapping. Other + characters in s are unchanged in the returned string. + + + + All of the preceding functions that take ERE as a parameter expect a + pattern or a string valued expression that is a regular expression as + defined below. + + + Input/Output and General Functions + The input/output and general functions are: + + close(expression) + Close the file or pipe opened by a print or + printf statement or a call to getline with + the same string-valued expression. If the + close was successful, the function returns + 0; otherwise, it returns non-zero. + + + fflush(expression) + Flush any buffered output for the file or + pipe opened by a print or printf statement + or a call to getline with the same string- + valued expression. If the flush was + successful, the function returns 0; + otherwise, it returns EOF. If no arguments + or the empty string ("") are given, then all + open files will be flushed. (Note that + fflush is supported in /usr/bin/awk only.) + + + expression|getline[var] + Read a record of input from a stream piped + from the output of a command. The stream is + created if no stream is currently open with + the value of expression as its command name. + The stream created is equivalent to one + created by a call to the popen function with + the value of expression as the command + argument and a value of r as the mode + argument. As long as the stream remains + open, subsequent calls in which expression + evaluates to the same string value reads + subsequent records from the file. The stream + remains open until the close function is + called with an expression that evaluates to + the same string value. At that time, the + stream is closed as if by a call to the + pclose function. If var is missing, $0 and + NF is set. Otherwise, var is set. + + The getline operator can form ambiguous + constructs when there are operators that are + not in parentheses (including concatenate) + to the left of the | (to the beginning of + the expression containing getline). In the + context of the $ operator, | behaves as if + it had a lower precedence than $. The result + of evaluating other operators is + unspecified, and all such uses of portable + applications must be put in parentheses + properly. + + getline ! Set $0 to the next input record from the ! current input file. This form of getline ! sets the NF, NR, and FNR variables. ! ! ! getline var ! Set variable var to the next input record ! from the current input file. This form of ! getline sets the FNR and NR variables. ! ! ! getline [var] < expression ! Read the next record of input from a named ! file. The expression is evaluated to produce ! a string that is used as a full pathname. If ! the file of that name is not currently open, ! it is opened. As long as the stream remains ! open, subsequent calls in which expression ! evaluates to the same string value reads ! subsequent records from the file. The file ! remains open until the close function is ! called with an expression that evaluates to ! the same string value. If var is missing, $0 ! and NF is set. Otherwise, var is set. ! ! The getline operator can form ambiguous ! constructs when there are binary operators ! that are not in parentheses (including ! concatenate) to the right of the < (up to ! the end of the expression containing the ! getline). The result of evaluating such a ! construct is unspecified, and all such uses ! of portable applications must be put in ! parentheses properly. ! ! ! system(expression) ! Execute the command given by expression in a ! manner equivalent to the system(3C) function ! and return the exit status of the command. ! ! ! ! All forms of getline return 1 for successful input, 0 for end of file, and -1 for an error. ! Where strings are used as the name of a file or pipeline, the strings ! must be textually identical. The terminology ``same string value'' ! implies that ``equivalent strings'', even those that differ only by ! space characters, represent different files. ! ! ! User-defined Functions ! The awk language also provides user-defined functions. Such functions ! can be defined as: ! ! function name(args,...) { statements } ! ! ! ! A function can be referred to anywhere in an awk program; in ! particular, its use can precede its definition. The scope of a function ! is global. ! ! ! Function arguments can be either scalars or arrays; the behavior is ! undefined if an array name is passed as an argument that the function ! uses as a scalar, or if a scalar expression is passed as an argument ! that the function uses as an array. Function arguments are passed by ! value if scalar and by reference if array name. Argument names are ! local to the function; all other variable names are global. The same ! name is not used as both an argument name and as the name of a function ! or a special awk variable. The same name must not be used both as a ! variable name with global scope and as the name of a function. The same ! name must not be used within the same scope both as a scalar variable ! and as an array. ! ! ! The number of parameters in the function definition need not match the ! number of parameters in the function call. Excess formal parameters can ! be used as local variables. If fewer arguments are supplied in a ! function call than are in the function definition, the extra parameters ! that are used in the function body as scalars are initialized with a ! string value of the null string and a numeric value of zero, and the ! extra parameters that are used in the function body as arrays are ! initialized as empty arrays. If more arguments are supplied in a ! function call than are in the function definition, the behavior is ! undefined. ! ! ! When invoking a function, no white space can be placed between the ! function name and the opening parenthesis. Function calls can be nested ! and recursive calls can be made upon functions. Upon return from any ! nested or recursive function call, the values of all of the calling ! function's parameters are unchanged, except for array parameters passed ! by reference. The return statement can be used to return a value. If a ! return statement appears outside of a function definition, the behavior ! is undefined. ! ! ! In the function definition, newline characters are optional before the ! opening brace and after the closing brace. Function definitions can ! appear anywhere in the program where a pattern-action pair is allowed. ! ! ! USAGE ! The index, length, match, and substr functions should not be confused ! with similar functions in the ISO C standard; the awk versions deal ! with characters, while the ISO C standard deals with bytes. ! ! ! Because the concatenation operation is represented by adjacent ! expressions rather than an explicit operator, it is often necessary to ! use parentheses to enforce the proper evaluation precedence. ! ! See largefile(5) for the description of the behavior of awk when ! encountering files greater than or equal to 2 Gbyte (2^31 bytes). + EXAMPLES ! The awk program specified in the command line is most easily specified ! within single-quotes (for example, 'program') for applications using ! sh, because awk programs commonly contain characters that are special ! to the shell, including double-quotes. In the cases where a awk program ! contains single-quote characters, it is usually easiest to specify most ! of the program as strings within single-quotes concatenated by the ! shell with quoted single-quote characters. For example: + awk '/'\''/ { print "quote:", $0 }' ! prints all lines from the standard input containing a single-quote ! character, prefixed with quote:. + The following are examples of simple awk programs: ! Example 1 Write to the standard output all input lines for which field ! 3 is greater than 5: + $3 > 5 ! Example 2 Write every tenth line: + (NR % 10) == 0 + Example 3 Write any line with a substring matching the regular + expression: ! /(G|D)(2[0-9][[:alpha:]]*)/ + Example 4 Print any line with a substring containing a G or D, followed + by a sequence of digits and characters: ! This example uses character classes digit and alpha to match language- ! independent digit and alphabetic characters, respectively. ! /(G|D)([[:digit:][:alpha:]]*)/ + Example 5 Write any line in which the second field matches the regular + expression and the fourth field does not: + $2 ~ /xyz/ && $4 !~ /xyz/ ! Example 6 Write any line in which the second field contains a ! backslash: + $2 ~ /\\/ + Example 7 Write any line in which the second field contains a backslash + (alternate method): + Notice that backslash escapes are interpreted twice, once in lexical + processing of the string and once in processing the regular expression. + $2 ~ "\\\\" + Example 8 Write the second to the last and the last field in each line, + separating the fields by a colon: ! {OFS=":";print $(NF-1), $NF} + Example 9 Write the line number and number of fields in each line: + + The three strings representing the line number, the colon and the + number of fields are concatenated and that string is written to + standard output. + + + {print NR ":" NF} + + + + Example 10 Write lines longer than 72 characters: + + {length($0) > 72} + + + + Example 11 Write first two fields in opposite order separated by the + OFS: + + { print $2, $1 } + + + + Example 12 Same, with input fields separated by comma or space and tab + characters, or both: + + BEGIN { FS = ",[\t]*|[\t]+" } + { print $2, $1 } + + + + Example 13 Add up first column, print sum and average: + + {s += $1 } + END {print "sum is ", s, " average is", s/NR} + + + + Example 14 Write fields in reverse order, one per line (many lines out + for each line in): + + { for (i = NF; i > 0; --i) print $i } + + + + Example 15 Write all lines between occurrences of the strings "start" + and "stop": + + /start/, /stop/ + + + + Example 16 Write all lines whose first field is different from the + previous one: + $1 != prev { print; prev = $1 } ! Example 17 Simulate the echo command: + BEGIN { + for (i = 1; i < ARGC; ++i) + printf "%s%s", ARGV[i], i==ARGC-1?"\n":"" + } ! Example 18 Write the path prefixes contained in the PATH environment ! variable, one per line: ! ! BEGIN { ! n = split (ENVIRON["PATH"], path, ":") ! for (i = 1; i <= n; ++i) ! print path[i] ! } ! ! ! ! Example 19 Print the file "input", filling in page numbers starting at ! 5: ! ! ! If there is a file named input containing page headers of the form ! ! ! Page# ! ! ! ! and a file named program that contains ! ! ! /Page/{ $2 = n++; } { print } ! then the command line ! awk -f program n=5 input + prints the file input, filling in page numbers starting at 5. + ENVIRONMENT VARIABLES See environ(5) for descriptions of the following environment variables ! that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. LC_NUMERIC Determine the radix character used when interpreting numeric input, performing conversions between numeric and string values and formatting numeric output. Regardless
*** 389,434 **** character of the POSIX locale) is the decimal-point character recognized in processing awk programs (including assignments in command-line arguments). ! ATTRIBUTES ! See attributes(5) for descriptions of the following attributes: ! /usr/bin/awk - +---------------+-----------------+ - |ATTRIBUTE TYPE | ATTRIBUTE VALUE | - +---------------+-----------------+ - |CSI | Not Enabled | - +---------------+-----------------+ - /usr/xpg4/bin/awk - +--------------------+-----------------+ - | ATTRIBUTE TYPE | ATTRIBUTE VALUE | - +--------------------+-----------------+ - |CSI | Enabled | - +--------------------+-----------------+ - |Interface Stability | Standard | - +--------------------+-----------------+ - SEE ALSO ! egrep(1), grep(1), nawk(1), sed(1), printf(3C), attributes(5), ! environ(5), largefile(5), standards(5) NOTES Input white space is not preserved on output if fields are involved. There are no explicit conversions between numbers and strings. To force ! an expression to be treated as a number, add 0 to it. To force an ! expression to be treated as a string, concatenate the null string ("") ! to it. ! June 22, 2005 AWK(1) --- 1381,1433 ---- character of the POSIX locale) is the decimal-point character recognized in processing awk programs (including assignments in command-line arguments). ! EXIT STATUS ! The following exit values are returned: ! 0 ! All input files were processed successfully. + >0 + An error occurred. + The exit status can be altered within the program by using an exit + expression. SEE ALSO ! ed(1), egrep(1), grep(1), lex(1), oawk(1), sed(1), popen(3C), ! printf(3C), system(3C), attributes(5), environ(5), largefile(5), ! regex(5), XPG4(5) + + Aho, A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming + Language, Addison-Wesley, 1988. + + + DIAGNOSTICS + If any file operand is specified and the named file cannot be accessed, + awk writes a diagnostic message to standard error and terminate without + any further action. + + + If the program specified by either the program operand or a progfile + operand is not a valid awk program (as specified in EXTENDED + DESCRIPTION), the behavior is undefined. + + NOTES Input white space is not preserved on output if fields are involved. There are no explicit conversions between numbers and strings. To force ! an expression to be treated as a number add 0 to it; to force it to be ! treated as a string concatenate the null string ("") to it. ! April 20, 2020 AWK(1)