Print this page
12482 Have /usr/bin/awk point to /usr/bin/nawk
Reviewed by: Peter Tribble <peter.tribble@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
| Split |
Close |
| Expand all |
| Collapse all |
--- old/usr/src/man/man1/awk.1.man.txt
+++ new/usr/src/man/man1/awk.1.man.txt
1 1 AWK(1) User Commands AWK(1)
2 2
3 3
4 4
5 5 NAME
6 6 awk - pattern scanning and processing language
7 7
8 8 SYNOPSIS
9 - /usr/bin/awk [-f progfile] [-Fc] [' prog '] [parameters]
10 - [filename]...
9 + /usr/bin/awk [-F ERE] [-v assignment] 'program' | -f progfile...
10 + [argument]...
11 11
12 12
13 - /usr/xpg4/bin/awk [-FcERE] [-v assignment]... 'program' -f progfile...
13 + /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile...
14 14 [argument]...
15 15
16 16
17 + /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile...
18 + [argument]...
19 +
20 +
17 21 DESCRIPTION
18 - The /usr/xpg4/bin/awk utility is described on the nawk(1) manual page.
22 + NOTE: The nawk command is now the system default awk for illumos.
19 23
24 + The /usr/bin/awk and /usr/xpg4/bin/awk utilities execute programs
25 + written in the awk programming language, which is specialized for
26 + textual data manipulation. A awk program is a sequence of patterns and
27 + corresponding actions. The string specifying program must be enclosed
28 + in single quotes (') to protect it from interpretation by the shell.
29 + The sequence of pattern - action statements can be specified in the
30 + command line as program or in one, or more, file(s) specified by the
31 + -fprogfile option. When input is read that matches a pattern, the
32 + action associated with the pattern is performed.
20 33
21 - The /usr/bin/awk utility scans each input filename for lines that match
22 - any of a set of patterns specified in prog. The prog string must be
23 - enclosed in single quotes ( a') to protect it from the shell. For each
24 - pattern in prog there can be an associated action performed when a line
25 - of a filename matches the pattern. The set of pattern-action statements
26 - can appear literally as prog or in a file specified with the -f
27 - progfile option. Input files are read in order; if there are no files,
28 - the standard input is read. The file name '-' means the standard input.
29 34
35 + Input is interpreted as a sequence of records. By default, a record is
36 + a line, but this can be changed by using the RS built-in variable. Each
37 + record of input is matched to each pattern in the program. For each
38 + pattern matched, the associated action is executed.
39 +
40 +
41 + The awk utility interprets each input record as a sequence of fields
42 + where, by default, a field is a string of non-blank characters. This
43 + default white-space field delimiter (blanks and/or tabs) can be changed
44 + by using the FS built-in variable or the -FERE option. The awk utility
45 + denotes the first field in a record $1, the second $2, and so forth.
46 + The symbol $0 refers to the entire record; setting any other field
47 + causes the reevaluation of $0. Assigning to $0 resets the values of all
48 + fields and the NF built-in variable.
49 +
50 +
30 51 OPTIONS
31 52 The following options are supported:
32 53
54 + -F ERE
55 + Define the input field separator to be the extended
56 + regular expression ERE, before any input is read (can
57 + be a character).
58 +
59 +
33 60 -f progfile
34 - awk uses the set of patterns it reads from progfile.
61 + Specifies the pathname of the file progfile containing
62 + a awk program. If multiple instances of this option
63 + are specified, the concatenation of the files
64 + specified as progfile in the order specified is the
65 + awk program. The awk program can alternatively be
66 + specified in the command line as a single argument.
35 67
36 68
37 - -Fc
38 - Uses the character c as the field separator (FS)
39 - character. See the discussion of FS below.
69 + -v assignment
70 + The assignment argument must be in the same form as an
71 + assignment operand. The assignment is of the form
72 + var=value, where var is the name of one of the
73 + variables described below. The specified assignment
74 + occurs before executing the awk program, including the
75 + actions associated with BEGIN patterns (if any).
76 + Multiple occurrences of this option can be specified.
40 77
41 78
42 -USAGE
43 - Input Lines
44 - Each input line is matched against the pattern portion of every
45 - pattern-action statement; the associated action is performed for each
46 - matched pattern. Any filename of the form var=value is treated as an
47 - assignment, not a filename, and is executed at the time it would have
48 - been opened if it were a filename. Variables assigned in this manner
49 - are not available inside a BEGIN rule, and are assigned after
50 - previously specified files have been read.
79 + -safe
80 + When passed to awk, this flag will prevent the program
81 + from opening new files or running child processes. The
82 + ENVIRON array will also not be initialized.
51 83
52 84
53 - An input line is normally made up of fields separated by white spaces.
54 - (This default can be changed by using the FS built-in variable or the
55 - -Fc option.) The default is to ignore leading blanks and to separate
56 - fields by blanks and/or tab characters. However, if FS is assigned a
57 - value that does not include any of the white spaces, then leading
58 - blanks are not ignored. The fields are denoted $1, $2, ...; $0 refers
59 - to the entire line.
85 +OPERANDS
86 + The following operands are supported:
60 87
61 - Pattern-action Statements
62 - A pattern-action statement has the form:
88 + program
89 + If no -f option is specified, the first operand to awk is
90 + the text of the awk program. The application supplies the
91 + program operand as a single argument to awk. If the text
92 + does not end in a newline character, awk interprets the
93 + text as if it did.
63 94
95 +
96 + argument
97 + Either of the following two types of argument can be
98 + intermixed:
99 +
100 + file
101 + A pathname of a file that contains the input
102 + to be read, which is matched against the set
103 + of patterns in the program. If no file
104 + operands are specified, or if a file operand
105 + is -, the standard input is used.
106 +
107 +
108 + assignment
109 + An operand that begins with an underscore or
110 + alphabetic character from the portable
111 + character set, followed by a sequence of
112 + underscores, digits and alphabetics from the
113 + portable character set, followed by the =
114 + character specifies a variable assignment
115 + rather than a pathname. The characters before
116 + the = represent the name of a awk variable.
117 + If that name is a awk reserved word, the
118 + behavior is undefined. The characters
119 + following the equal sign is interpreted as if
120 + they appeared in the awk program preceded and
121 + followed by a double-quote (") character, as
122 + a STRING token , except that if the last
123 + character is an unescaped backslash, it is
124 + interpreted as a literal backslash rather
125 + than as the first character of the sequence
126 + \.. The variable is assigned the value of
127 + that STRING token. If the value is considered
128 + a numericstring, the variable is assigned its
129 + numeric value. Each such variable assignment
130 + is performed just before the processing of
131 + the following file, if any. Thus, an
132 + assignment before the first file argument is
133 + executed after the BEGIN actions (if any),
134 + while an assignment after the last file
135 + argument is executed before the END actions
136 + (if any). If there are no file arguments,
137 + assignments are executed before processing
138 + the standard input.
139 +
140 +
141 +
142 +INPUT FILES
143 + Input files to the awk program from any of the following sources:
144 +
145 + o any file operands or their equivalents, achieved by
146 + modifying the awk variables ARGV and ARGC
147 +
148 + o standard input in the absence of any file operands
149 +
150 + o arguments to the getline function
151 +
152 +
153 + must be text files. Whether the variable RS is set to a value other
154 + than a newline character or not, for these files, implementations
155 + support records terminated with the specified separator up to
156 + {LINE_MAX} bytes and can support longer records.
157 +
158 +
159 + If -f progfile is specified, the files named by each of the progfile
160 + option-arguments must be text files containing an awk program.
161 +
162 +
163 + The standard input are used only if no file operands are specified, or
164 + if a file operand is -.
165 +
166 +
167 +EXTENDED DESCRIPTION
168 + A awk program is composed of pairs of the form:
169 +
64 170 pattern { action }
65 171
66 172
67 173
174 + Either the pattern or the action (including the enclosing brace
175 + characters) can be omitted. Pattern-action statements are separated by
176 + a semicolon or by a newline.
68 177
69 - Either pattern or action can be omitted. If there is no action, the
70 - matching line is printed. If there is no pattern, the action is
71 - performed on every input line. Pattern-action statements are separated
72 - by newlines or semicolons.
73 178
179 + A missing pattern matches any record of input, and a missing action is
180 + equivalent to an action that writes the matched record of input to
181 + standard output.
74 182
75 - Patterns are arbitrary Boolean combinations ( !, ||, &&, and
76 - parentheses) of relational expressions and regular expressions. A
77 - relational expression is one of the following:
78 183
79 - expression relop expression
80 - expression matchop regular_expression
184 + Execution of the awk program starts by first executing the actions
185 + associated with all BEGIN patterns in the order they occur in the
186 + program. Then each file operand (or standard input if no files were
187 + specified) is processed by reading data from the file until a record
188 + separator is seen (a newline character by default), splitting the
189 + current record into fields using the current value of FS, evaluating
190 + each pattern in the program in the order of occurrence, and executing
191 + the action associated with each pattern that matches the current
192 + record. The action for a matching pattern is executed before evaluating
193 + subsequent patterns. Last, the actions associated with all END patterns
194 + is executed in the order they occur in the program.
81 195
82 196
197 + Expressions in awk
198 + Expressions describe computations used in patterns and actions. In the
199 + following table, valid expression operations are given in groups from
200 + highest precedence first to lowest precedence last, with equal-
201 + precedence operators grouped between horizontal lines. In expression
202 + evaluation, where the grammar is formally ambiguous, higher precedence
203 + operators are evaluated before lower precedence operators. In this
204 + table expr, expr1, expr2, and expr3 represent any expression, while
205 + lvalue represents any entity that can be assigned to (that is, on the
206 + left side of an assignment operator).
83 207
84 - where a relop is any of the six relational operators in C, and a
85 - matchop is either ~ (contains) or !~ (does not contain). An expression
86 - is an arithmetic expression, a relational expression, the special
87 - expression
88 208
89 - var in array
90 209
91 210
211 + Syntax Name Type of Result Associativity
212 + -------------------------------------------------------------------------------
213 + ( expr ) Grouping type of expr n/a
214 + -------------------------------------------------------------------------------
215 + $expr Field reference string n/a
216 + -------------------------------------------------------------------------------
217 + ++ lvalue Pre-increment numeric n/a
218 + -- lvalue Pre-decrement numeric n/a
219 + lvalue ++ Post-increment numeric n/a
220 + lvalue -- Post-decrement numeric n/a
221 + -------------------------------------------------------------------------------
222 + expr ^ expr Exponentiation numeric right
223 + -------------------------------------------------------------------------------
224 + ! expr Logical not numeric n/a
225 + + expr Unary plus numeric n/a
226 + - expr Unary minus numeric n/a
227 + -------------------------------------------------------------------------------
228 + expr * expr Multiplication numeric left
229 + expr / expr Division numeric left
230 + expr % expr Modulus numeric left
231 + -------------------------------------------------------------------------------
232 + expr + expr Addition numeric left
233 + expr - expr Subtraction numeric left
234 + -------------------------------------------------------------------------------
235 + expr expr String concatenation string left
236 + -------------------------------------------------------------------------------
237 + expr < expr Less than numeric none
238 + expr <= expr Less than or equal to numeric none
239 + expr != expr Not equal to numeric none
240 + expr == expr Equal to numeric none
241 + expr > expr Greater than numeric none
242 + expr >= expr Greater than or equal to numeric none
243 + -------------------------------------------------------------------------------
244 + expr ~ expr ERE match numeric none
245 + expr !~ expr ERE non-match numeric none
246 + -------------------------------------------------------------------------------
247 + expr in array Array membership numeric left
248 + ( index ) in Multi-dimension array numeric left
249 + array membership
250 + -------------------------------------------------------------------------------
251 + expr && expr Logical AND numeric left
252 + -------------------------------------------------------------------------------
253 + expr || expr Logical OR numeric left
254 + -------------------------------------------------------------------------------
255 + expr1 ? expr2 Conditional expression type of selected right
256 + : expr3 expr2 or expr3
257 + -------------------------------------------------------------------------------
258 + lvalue ^= expr Exponentiation numeric right
259 + assignment
260 + lvalue %= expr Modulus assignment numeric right
261 + lvalue *= expr Multiplication numeric right
262 + assignment
263 + lvalue /= expr Division assignment numeric right
264 + lvalue += expr Addition assignment numeric right
265 + lvalue -= expr Subtraction assignment numeric right
266 + lvalue = expr Assignment type of expr right
92 267
93 - or a Boolean combination of these.
94 268
95 269
96 - Regular expressions are as in egrep(1). In patterns they must be
97 - surrounded by slashes. Isolated regular expressions in a pattern apply
98 - to the entire line. Regular expressions can also occur in relational
99 - expressions. A pattern can consist of two patterns separated by a
100 - comma; in this case, the action is performed for all lines between the
101 - occurrence of the first pattern to the occurrence of the second
102 - pattern.
270 + Each expression has either a string value, a numeric value or both.
271 + Except as stated for specific contexts, the value of an expression is
272 + implicitly converted to the type needed for the context in which it is
273 + used. A string value is converted to a numeric value by the equivalent
274 + of the following calls:
103 275
276 + setlocale(LC_NUMERIC, "");
277 + numeric_value = atof(string_value);
104 278
105 - The special patterns BEGIN and END can be used to capture control
106 - before the first input line has been read and after the last input line
107 - has been read respectively. These keywords do not combine with any
108 - other patterns.
109 279
110 - Built-in Variables
111 - Built-in variables include:
112 280
281 + A numeric value that is exactly equal to the value of an integer is
282 + converted to a string by the equivalent of a call to the sprintf
283 + function with the string %d as the fmt argument and the numeric value
284 + being converted as the first and only expr argument. Any other numeric
285 + value is converted to a string by the equivalent of a call to the
286 + sprintf function with the value of the variable CONVFMT as the fmt
287 + argument and the numeric value being converted as the first and only
288 + expr argument.
289 +
290 +
291 + A string value is considered to be a numeric string in the following
292 + case:
293 +
294 + 1. Any leading and trailing blank characters is ignored.
295 +
296 + 2. If the first unignored character is a + or -, it is ignored.
297 +
298 + 3. If the remaining unignored characters would be lexically
299 + recognized as a NUMBER token, the string is considered a
300 + numeric string.
301 +
302 +
303 + If a - character is ignored in the above steps, the numeric value of
304 + the numeric string is the negation of the numeric value of the
305 + recognized NUMBER token. Otherwise the numeric value of the numeric
306 + string is the numeric value of the recognized NUMBER token. Whether or
307 + not a string is a numeric string is relevant only in contexts where
308 + that term is used in this section.
309 +
310 +
311 + When an expression is used in a Boolean context, if it has a numeric
312 + value, a value of zero is treated as false and any other value is
313 + treated as true. Otherwise, a string value of the null string is
314 + treated as false and any other value is treated as true. A Boolean
315 + context is one of the following:
316 +
317 + o the first subexpression of a conditional expression.
318 +
319 + o an expression operated on by logical NOT, logical AND, or
320 + logical OR.
321 +
322 + o the second expression of a for statement.
323 +
324 + o the expression of an if statement.
325 +
326 + o the expression of the while clause in either a while or do
327 + ... while statement.
328 +
329 + o an expression used as a pattern (as in Overall Program
330 + Structure).
331 +
332 +
333 + The awk language supplies arrays that are used for storing numbers or
334 + strings. Arrays need not be declared. They are initially empty, and
335 + their sizes changes dynamically. The subscripts, or element
336 + identifiers, are strings, providing a type of associative array
337 + capability. An array name followed by a subscript within square
338 + brackets can be used as an lvalue and as an expression, as described in
339 + the grammar. Unsubscripted array names are used in only the following
340 + contexts:
341 +
342 + o a parameter in a function definition or function call.
343 +
344 + o the NAME token following any use of the keyword in.
345 +
346 +
347 + A valid array index consists of one or more comma-separated
348 + expressions, similar to the way in which multi-dimensional arrays are
349 + indexed in some programming languages. Because awk arrays are really
350 + one-dimensional, such a comma-separated list is converted to a single
351 + string by concatenating the string values of the separate expressions,
352 + each separated from the other by the value of the SUBSEP variable.
353 +
354 +
355 + Thus, the following two index operations are equivalent:
356 +
357 + var[expr1, expr2, ... exprn]
358 + var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
359 +
360 +
361 +
362 + A multi-dimensioned index used with the in operator must be put in
363 + parentheses. The in operator, which tests for the existence of a
364 + particular array element, does not create the element if it does not
365 + exist. Any other reference to a non-existent array element
366 + automatically creates it.
367 +
368 +
369 + Variables and Special Variables
370 + Variables can be used in an awk program by referencing them. With the
371 + exception of function parameters, they are not explicitly declared.
372 + Uninitialized scalar variables and array elements have both a numeric
373 + value of zero and a string value of the empty string.
374 +
375 +
376 + Field variables are designated by a $ followed by a number or numerical
377 + expression. The effect of the field number expression evaluating to
378 + anything other than a non-negative integer is unspecified.
379 + Uninitialized variables or string values need not be converted to
380 + numeric values in this context. New field variables are created by
381 + assigning a value to them. References to non-existent fields (that is,
382 + fields after $NF) produce the null string. However, assigning to a non-
383 + existent field (for example, $(NF+2) = 5) increases the value of NF,
384 + create any intervening fields with the null string as their values and
385 + cause the value of $0 to be recomputed, with the fields being separated
386 + by the value of OFS. Each field variable has a string value when
387 + created. If the string, with any occurrence of the decimal-point
388 + character from the current locale changed to a period character, is
389 + considered a numeric string (see Expressions in awk above), the field
390 + variable also has the numeric value of the numeric string.
391 +
392 +
393 + /usr/bin/awk, /usr/xpg4/bin/awk
394 + awk sets the following special variables that are supported by both
395 + /usr/bin/awk and /usr/xpg4/bin/awk:
396 +
397 + ARGC
398 + The number of elements in the ARGV array.
399 +
400 +
401 + ARGV
402 + An array of command line arguments, excluding options and
403 + the program argument, numbered from zero to ARGC-1.
404 +
405 + The arguments in ARGV can be modified or added to; ARGC can
406 + be altered. As each input file ends, awk treats the next
407 + non-null element of ARGV, up to the current value of
408 + ARGC-1, inclusive, as the name of the next input file.
409 + Setting an element of ARGV to null means that it is not
410 + treated as an input file. The name - indicates the standard
411 + input. If an argument matches the format of an assignment
412 + operand, this argument is treated as an assignment rather
413 + than a file argument.
414 +
415 +
416 + CONVFMT
417 + The printf format for converting numbers to strings (except
418 + for output statements, where OFMT is used). The default is
419 + %.6g.
420 +
421 +
422 + ENVIRON
423 + The variable ENVIRON is an array representing the value of
424 + the environment. The indices of the array are strings
425 + consisting of the names of the environment variables, and
426 + the value of each array element is a string consisting of
427 + the value of that variable. If the value of an environment
428 + variable is considered a numeric string, the array element
429 + also has its numeric value.
430 +
431 + In all cases where awk behavior is affected by environment
432 + variables (including the environment of any commands that
433 + awk executes via the system function or via pipeline
434 + redirections with the print statement, the printf
435 + statement, or the getline function), the environment used
436 + is the environment at the time awk began executing.
437 +
438 +
113 439 FILENAME
114 - name of the current input file
440 + A pathname of the current input file. Inside a BEGIN action
441 + the value is undefined. Inside an END action the value is
442 + the name of the last input file processed.
115 443
116 444
445 + FNR
446 + The ordinal number of the current record in the current
447 + file. Inside a BEGIN action the value is zero. Inside an
448 + END action the value is the number of the last record
449 + processed in the last file processed.
450 +
451 +
117 452 FS
118 - input field separator regular expression (default blank
119 - and tab)
453 + Input field separator regular expression; a space character
454 + by default.
120 455
121 456
122 457 NF
123 - number of fields in the current record
458 + The number of fields in the current record. Inside a BEGIN
459 + action, the use of NF is undefined unless a getline
460 + function without a var argument is executed previously.
461 + Inside an END action, NF retains the value it had for the
462 + last record read, unless a subsequent, redirected, getline
463 + function without a var argument is performed prior to
464 + entering the END action.
124 465
125 466
126 467 NR
127 - ordinal number of the current record
468 + The ordinal number of the current record from the start of
469 + input. Inside a BEGIN action the value is zero. Inside an
470 + END action the value is the number of the last record
471 + processed.
128 472
129 473
130 474 OFMT
131 - output format for numbers (default %.6g)
475 + The printf format for converting numbers to strings in
476 + output statements "%.6g" by default. The result of the
477 + conversion is unspecified if the value of OFMT is not a
478 + floating-point format specification.
132 479
133 480
134 481 OFS
135 - output field separator (default blank)
482 + The print statement output field separator; a space
483 + character by default.
136 484
137 485
138 486 ORS
139 - output record separator (default new-line)
487 + The print output record separator; a newline character by
488 + default.
140 489
141 490
491 + RLENGTH
492 + The length of the string matched by the match function.
493 +
494 +
142 495 RS
143 - input record separator (default new-line)
496 + The first character of the string value of RS is the input
497 + record separator; a newline character by default. If RS
498 + contains more than one character, the results are
499 + unspecified. If RS is null, then records are separated by
500 + sequences of one or more blank lines. Leading or trailing
501 + blank lines do not produce empty records at the beginning
502 + or end of input, and the field separator is always newline,
503 + no matter what the value of FS.
144 504
145 505
506 + RSTART
507 + The starting position of the string matched by the match
508 + function, numbering from 1. This is always equivalent to
509 + the return value of the match function.
146 510
511 +
512 + SUBSEP
513 + The subscript separator string for multi-dimensional
514 + arrays. The default value is \034.
515 +
516 +
517 + /usr/bin/awk
518 + The following variable is supported for /usr/bin/awk only:
519 +
520 + RT
521 + The record terminator for the most recent record read. For
522 + most records this will be the same value as RS. At the end
523 + of a file with no trailing separator value, though, this
524 + will be set to the empty string ("").
525 +
526 +
527 + Regular Expressions
528 + The awk utility makes use of the extended regular expression notation
529 + (see regex(5)) except that it allows the use of C-language conventions
530 + to escape special characters within the EREs, namely \\, \a, \b, \f,
531 + \n, \r, \t, \v, and those specified in the following table. These
532 + escape sequences are recognized both inside and outside bracket
533 + expressions. Note that records need not be separated by newline
534 + characters and string constants can contain newline characters, so even
535 + the \n sequence is valid in awk EREs. Using a slash character within
536 + the regular expression requires escaping as shown in the table below:
537 +
538 +
539 +
540 +
541 + Escape Sequence Description Meaning
542 + ----------------------------------------------------------------------
543 + \" Backslash quotation-mark Quotation-mark character
544 + ----------------------------------------------------------------------
545 + \/ Backslash slash Slash character
546 + ----------------------------------------------------------------------
547 + \ddd A backslash character The character encoded by
548 + followed by the longest the one-, two- or
549 + sequence of one, two, or three-digit octal
550 + three octal-digit integer. Multi-byte
551 + characters (01234567). characters require
552 + If all of the digits are multiple, concatenated
553 + 0, (that is, escape sequences,
554 + representation of the including the leading \
555 + NULL character), the for each byte.
556 + behavior is undefined.
557 + ----------------------------------------------------------------------
558 + \c A backslash character Undefined
559 + followed by any
560 + character not described
561 + in this table or special
562 + characters (\\, \a, \b,
563 + \f, \n, \r, \t, \v).
564 +
565 +
566 +
567 + A regular expression can be matched against a specific field or string
568 + by using one of the two regular expression matching operators, ~ and
569 + !~. These operators interpret their right-hand operand as a regular
570 + expression and their left-hand operand as a string. If the regular
571 + expression matches the string, the ~ expression evaluates to the value
572 + 1, and the !~ expression evaluates to the value 0. If the regular
573 + expression does not match the string, the ~ expression evaluates to the
574 + value 0, and the !~ expression evaluates to the value 1. If the right-
575 + hand operand is any expression other than the lexical token ERE, the
576 + string value of the expression is interpreted as an extended regular
577 + expression, including the escape conventions described above. Notice
578 + that these same escape conventions also are applied in the determining
579 + the value of a string literal (the lexical token STRING), and is
580 + applied a second time when a string literal is used in this context.
581 +
582 +
583 + When an ERE token appears as an expression in any context other than as
584 + the right-hand of the ~ or !~ operator or as one of the built-in
585 + function arguments described below, the value of the resulting
586 + expression is the equivalent of:
587 +
588 + $0 ~ /ere/
589 +
590 +
591 +
592 + The ere argument to the gsub, match, sub functions, and the fs argument
593 + to the split function (see String Functions) is interpreted as extended
594 + regular expressions. These can be either ERE tokens or arbitrary
595 + expressions, and are interpreted in the same manner as the right-hand
596 + side of the ~ or !~ operator.
597 +
598 +
599 + An extended regular expression can be used to separate fields by using
600 + the -F ERE option or by assigning a string containing the expression to
601 + the built-in variable FS. The default value of the FS variable is a
602 + single space character. The following describes FS behavior:
603 +
604 + 1. If FS is a single character:
605 +
606 + o If FS is the space character, skip leading and trailing
607 + blank characters; fields are delimited by sets of one or
608 + more blank characters.
609 +
610 + o Otherwise, if FS is any other character c, fields are
611 + delimited by each single occurrence of c.
612 +
613 + 2. Otherwise, the string value of FS is considered to be an
614 + extended regular expression. Each occurrence of a sequence
615 + matching the extended regular expression delimits fields.
616 +
617 +
618 + Except in the gsub, match, split, and sub built-in functions, regular
619 + expression matching is based on input records. That is, record
620 + separator characters (the first character of the value of the variable
621 + RS, a newline character by default) cannot be embedded in the
622 + expression, and no expression matches the record separator character.
623 + If the record separator is not a newline character, newline characters
624 + embedded in the expression can be matched. In those four built-in
625 + functions, regular expression matching are based on text strings. So,
626 + any character (including the newline character and the record
627 + separator) can be embedded in the pattern and an appropriate pattern
628 + matches any character. However, in all awk regular expression matching,
629 + the use of one or more NULL characters in the pattern, input record or
630 + text string produces undefined results.
631 +
632 +
633 + Patterns
634 + A pattern is any valid expression, a range specified by two expressions
635 + separated by comma, or one of the two special patterns BEGIN or END.
636 +
637 +
638 + Special Patterns
639 + The awk utility recognizes two special patterns, BEGIN and END. Each
640 + BEGIN pattern is matched once and its associated action executed before
641 + the first record of input is read (except possibly by use of the
642 + getline function in a prior BEGIN action) and before command line
643 + assignment is done. Each END pattern is matched once and its associated
644 + action executed after the last record of input has been read. These two
645 + patterns have associated actions.
646 +
647 +
648 + BEGIN and END do not combine with other patterns. Multiple BEGIN and
649 + END patterns are allowed. The actions associated with the BEGIN
650 + patterns are executed in the order specified in the program, as are the
651 + END actions. An END pattern can precede a BEGIN pattern in a program.
652 +
653 +
654 + If an awk program consists of only actions with the pattern BEGIN, and
655 + the BEGIN action contains no getline function, awk exits without
656 + reading its input when the last statement in the last BEGIN action is
657 + executed. If an awk program consists of only actions with the pattern
658 + END or only actions with the patterns BEGIN and END, the input is read
659 + before the statements in the END actions are executed.
660 +
661 +
662 + Expression Patterns
663 + An expression pattern is evaluated as if it were an expression in a
664 + Boolean context. If the result is true, the pattern is considered to
665 + match, and the associated action (if any) is executed. If the result is
666 + false, the action is not executed.
667 +
668 +
669 + Pattern Ranges
670 + A pattern range consists of two expressions separated by a comma. In
671 + this case, the action is performed for all records between a match of
672 + the first expression and the following match of the second expression,
673 + inclusive. At this point, the pattern range can be repeated starting at
674 + input records subsequent to the end of the matched range.
675 +
676 +
677 + Actions
147 678 An action is a sequence of statements. A statement can be one of the
148 679 following:
149 680
150 681 if ( expression ) statement [ else statement ]
151 682 while ( expression ) statement
152 683 do statement while ( expression )
153 684 for ( expression ; expression ; expression ) statement
154 685 for ( var in array ) statement
686 + delete array[subscript] #delete an array element
687 + delete array #delete all elements within an array
155 688 break
156 689 continue
157 690 { [ statement ] ... }
158 - expression # commonly variable = expression
691 + expression # commonly variable = expression
159 692 print [ expression-list ] [ >expression ]
160 693 printf format [ ,expression-list ] [ >expression ]
161 - next # skip remaining patterns on this input line
162 - exit [expr] # skip the rest of the input; exit status is expr
694 + next # skip remaining patterns on this input line
695 + nextfile # skip remaining patterns on this input file
696 + exit [expr] # skip the rest of the input; exit status is expr
697 + return [expr]
163 698
164 699
165 700
166 - Statements are terminated by semicolons, newlines, or right braces. An
167 - empty expression-list stands for the whole input line. Expressions take
168 - on string or numeric values as appropriate, and are built using the
169 - operators +, -, *, /, %, ^ and concatenation (indicated by a blank).
170 - The operators ++, --, +=, -=, *=, /=, %=, ^=, >, >=, <, <=, ==, !=, and
171 - ?: are also available in expressions. Variables can be scalars, array
172 - elements (denoted x[i]), or fields. Variables are initialized to the
173 - null string or zero. Array subscripts can be any string, not
174 - necessarily numeric; this allows for a form of associative memory.
175 - String constants are quoted (""), with the usual C escapes recognized
176 - within.
701 + Any single statement can be replaced by a statement list enclosed in
702 + braces. The statements are terminated by newline characters or
703 + semicolons, and are executed sequentially in the order that they
704 + appear.
177 705
178 706
179 - The print statement prints its arguments on the standard output, or on
180 - a file if >expression is present, or on a pipe if '|cmd' is present.
181 - The output resulted from the print statement is terminated by the
182 - output record separator with each argument separated by the current
183 - output field separator. The printf statement formats its expression
184 - list according to the format (see printf(3C)).
707 + The next statement causes all further processing of the current input
708 + record to be abandoned. The behavior is undefined if a next statement
709 + appears or is invoked in a BEGIN or END action.
185 710
186 - Built-in Functions
187 - The arithmetic functions are as follows:
188 711
712 + The nextfile statement is similar to next, but also skips all other
713 + records in the current file, and moves on to processing the next input
714 + file if available (or exits the program if there are none). (Note that
715 + this keyword is not supported by /usr/xpg4/bin/awk.)
716 +
717 +
718 + The exit statement invokes all END actions in the order in which they
719 + occur in the program source and then terminate the program without
720 + reading further input. An exit statement inside an END action
721 + terminates the program without further execution of END actions. If an
722 + expression is specified in an exit statement, its numeric value is the
723 + exit status of awk, unless subsequent errors are encountered or a
724 + subsequent exit statement with an expression is executed.
725 +
726 +
727 + Output Statements
728 + Both print and printf statements write to standard output by default.
729 + The output is written to the location specified by output_redirection
730 + if one is supplied, as follows:
731 +
732 + > expression>> expression| expression
733 +
734 +
735 +
736 + In all cases, the expression is evaluated to produce a string that is
737 + used as a full pathname to write into (for > or >>) or as a command to
738 + be executed (for |). Using the first two forms, if the file of that
739 + name is not currently open, it is opened, creating it if necessary and
740 + using the first form, truncating the file. The output then is appended
741 + to the file. As long as the file remains open, subsequent calls in
742 + which expression evaluates to the same string value simply appends
743 + output to the file. The file remains open until the close function,
744 + which is called with an expression that evaluates to the same string
745 + value.
746 +
747 +
748 + The third form writes output onto a stream piped to the input of a
749 + command. The stream is created if no stream is currently open with the
750 + value of expression as its command name. The stream created is
751 + equivalent to one created by a call to the popen(3C) function with the
752 + value of expression as the command argument and a value of w as the
753 + mode argument. As long as the stream remains open, subsequent calls in
754 + which expression evaluates to the same string value writes output to
755 + the existing stream. The stream remains open until the close function
756 + is called with an expression that evaluates to the same string value.
757 + At that time, the stream is closed as if by a call to the pclose
758 + function.
759 +
760 +
761 + These output statements take a comma-separated list of expression s
762 + referred in the grammar by the non-terminal symbols expr_list,
763 + print_expr_list or print_expr_list_opt. This list is referred to here
764 + as the expression list, and each member is referred to as an expression
765 + argument.
766 +
767 +
768 + The print statement writes the value of each expression argument onto
769 + the indicated output stream separated by the current output field
770 + separator (see variable OFS above), and terminated by the output record
771 + separator (see variable ORS above). All expression arguments is taken
772 + as strings, being converted if necessary; with the exception that the
773 + printf format in OFMT is used instead of the value in CONVFMT. An empty
774 + expression list stands for the whole input record ($0).
775 +
776 +
777 + The printf statement produces output based on a notation similar to the
778 + File Format Notation used to describe file formats in this document
779 + Output is produced as specified with the first expression argument as
780 + the string format and subsequent expression arguments as the strings
781 + arg1 to argn, inclusive, with the following exceptions:
782 +
783 + 1. The format is an actual character string rather than a
784 + graphical representation. Therefore, it cannot contain empty
785 + character positions. The space character in the format
786 + string, in any context other than a flag of a conversion
787 + specification, is treated as an ordinary character that is
788 + copied to the output.
789 +
790 + 2. If the character set contains a Delta character and that
791 + character appears in the format string, it is treated as an
792 + ordinary character that is copied to the output.
793 +
794 + 3. The escape sequences beginning with a backslash character is
795 + treated as sequences of ordinary characters that are copied
796 + to the output. Note that these same sequences is interpreted
797 + lexically by awk when they appear in literal strings, but
798 + they is not treated specially by the printf statement.
799 +
800 + 4. A field width or precision can be specified as the *
801 + character instead of a digit string. In this case the next
802 + argument from the expression list is fetched and its numeric
803 + value taken as the field width or precision.
804 +
805 + 5. The implementation does not precede or follow output from
806 + the d or u conversion specifications with blank characters
807 + not specified by the format string.
808 +
809 + 6. The implementation does not precede output from the o
810 + conversion specification with leading zeros not specified by
811 + the format string.
812 +
813 + 7. For the c conversion specification: if the argument has a
814 + numeric value, the character whose encoding is that value is
815 + output. If the value is zero or is not the encoding of any
816 + character in the character set, the behavior is undefined.
817 + If the argument does not have a numeric value, the first
818 + character of the string value is output; if the string does
819 + not contain any characters the behavior is undefined.
820 +
821 + 8. For each conversion specification that consumes an argument,
822 + the next expression argument is evaluated. With the
823 + exception of the c conversion, the value is converted to the
824 + appropriate type for the conversion specification.
825 +
826 + 9. If there are insufficient expression arguments to satisfy
827 + all the conversion specifications in the format string, the
828 + behavior is undefined.
829 +
830 + 10. If any character sequence in the format string begins with a
831 + % character, but does not form a valid conversion
832 + specification, the behavior is unspecified.
833 +
834 +
835 + Both print and printf can output at least {LINE_MAX} bytes.
836 +
837 +
838 + Functions
839 + The awk language has a variety of built-in functions: arithmetic,
840 + string, input/output and general.
841 +
842 +
843 + Arithmetic Functions
844 + The arithmetic functions, except for int, are based on the ISO C
845 + standard. The behavior is undefined in cases where the ISO C standard
846 + specifies that an error be returned or that the behavior is undefined.
847 + Although the grammar permits built-in functions to appear with no
848 + arguments or parentheses, unless the argument or parentheses are
849 + indicated as optional in the following list (by displaying them within
850 + the [ ] brackets), such use is undefined.
851 +
852 + atan2(y,x)
853 + Return arctangent of y/x.
854 +
855 +
189 856 cos(x)
190 - Return cosine of x, where x is in radians. (In
191 - /usr/xpg4/bin/awk only. See nawk(1).)
857 + Return cosine of x, where x is in radians.
192 858
193 859
194 860 sin(x)
195 - Return sine of x, where x is in radians. (In
196 - /usr/xpg4/bin/awk only. See nawk(1).)
861 + Return sine of x, where x is in radians.
197 862
198 863
199 864 exp(x)
200 - Return the exponential function of x.
865 + Return the exponential function of x.
201 866
202 867
203 868 log(x)
204 - Return the natural logarithm of x.
869 + Return the natural logarithm of x.
205 870
206 871
207 872 sqrt(x)
208 - Return the square root of x.
873 + Return the square root of x.
209 874
210 875
211 876 int(x)
212 - Truncate its argument to an integer. It is truncated toward
213 - 0 when x > 0.
877 + Truncate its argument to an integer. It is truncated
878 + toward 0 when x > 0.
214 879
215 880
881 + rand()
882 + Return a random number n, such that 0 <= n < 1.
216 883
217 - The string functions are as follows:
218 884
219 - index(s, t)
885 + srand([expr])
886 + Set the seed value for rand to expr or use the time of
887 + day if expr is omitted. The previous seed value is
888 + returned.
220 889
221 - Return the position in string s where string t first occurs, or 0
222 - if it does not occur at all.
223 890
891 + String Functions
892 + The string functions in the following list shall be supported. Although
893 + the grammar permits built-in functions to appear with no arguments or
894 + parentheses, unless the argument or parentheses are indicated as
895 + optional in the following list (by displaying them within the [ ]
896 + brackets), such use is undefined.
224 897
225 - int(s)
898 + gsub(ere,repl[,in])
226 899
227 - truncates s to an integer value. If s is not specified, $0 is used.
900 + Behave like sub (see below), except that it replaces all
901 + occurrences of the regular expression (like the ed utility global
902 + substitute) in $0 or in the in argument, when specified.
228 903
229 904
230 - length(s)
905 + index(s,t)
231 906
232 - Return the length of its argument taken as a string, or of the
233 - whole line if there is no argument.
907 + Return the position, in characters, numbering from 1, in string s
908 + where string t first occurs, or zero if it does not occur at all.
234 909
235 910
236 - split(s, a, fs)
911 + length[([v])]
237 912
238 - Split the string s into array elements a[1], a[2], ... a[n], and
239 - returns n. The separation is done with the regular expression fs or
240 - with the field separator FS if fs is not given.
913 + Given no argument, this function returns the length of the whole
914 + record, $0. If given an array as an argument (and using
915 + /usr/bin/awk), then this returns the number of elements it
916 + contains. Otherwise, this function interprets the argument as a
917 + string (performing any needed conversions) and returns its length
918 + in characters.
241 919
242 920
243 - sprintf(fmt, expr, expr,...)
921 + match(s,ere)
244 922
245 - Format the expressions according to the printf(3C) format given by
246 - fmt and returns the resulting string.
923 + Return the position, in characters, numbering from 1, in string s
924 + where the extended regular expression ere occurs, or zero if it
925 + does not occur at all. RSTART is set to the starting position
926 + (which is the same as the returned value), zero if no match is
927 + found; RLENGTH is set to the length of the matched string, -1 if no
928 + match is found.
247 929
248 930
249 - substr(s, m, n)
931 + split(s,a[,fs])
250 932
251 - returns the n-character substring of s that begins at position m.
933 + Split the string s into array elements a[1], a[2], ..., a[n], and
934 + return n. The separation is done with the extended regular
935 + expression fs or with the field separator FS if fs is not given.
936 + Each array element has a string value when created. If the string
937 + assigned to any array element, with any occurrence of the decimal-
938 + point character from the current locale changed to a period
939 + character, would be considered a numeric string; the array element
940 + also has the numeric value of the numeric string. The effect of a
941 + null string as the value of fs is unspecified.
252 942
253 943
944 + sprintf(fmt,expr,expr,...)
254 945
255 - The input/output function is as follows:
946 + Format the expressions according to the printf format given by fmt
947 + and return the resulting string.
256 948
949 +
950 + sub(ere,repl[,in])
951 +
952 + Substitute the string repl in place of the first instance of the
953 + extended regular expression ERE in string in and return the number
954 + of substitutions. An ampersand ( & ) appearing in the string repl
955 + is replaced by the string from in that matches the regular
956 + expression. An ampersand preceded with a backslash ( \ ) is
957 + interpreted as the literal ampersand character. An occurrence of
958 + two consecutive backslashes is interpreted as just a single literal
959 + backslash character. Any other occurrence of a backslash (for
960 + example, preceding any other character) is treated as a literal
961 + backslash character. If repl is a string literal, the handling of
962 + the ampersand character occurs after any lexical processing,
963 + including any lexical backslash escape sequence processing. If in
964 + is specified and it is not an lvalue the behavior is undefined. If
965 + in is omitted, awk uses the current record ($0) in its place.
966 +
967 +
968 + substr(s,m[,n])
969 +
970 + Return the at most n-character substring of s that begins at
971 + position m, numbering from 1. If n is missing, the length of the
972 + substring is limited by the length of the string s.
973 +
974 +
975 + tolower(s)
976 +
977 + Return a string based on the string s. Each character in s that is
978 + an upper-case letter specified to have a tolower mapping by the
979 + LC_CTYPE category of the current locale is replaced in the returned
980 + string by the lower-case letter specified by the mapping. Other
981 + characters in s are unchanged in the returned string.
982 +
983 +
984 + toupper(s)
985 +
986 + Return a string based on the string s. Each character in s that is
987 + a lower-case letter specified to have a toupper mapping by the
988 + LC_CTYPE category of the current locale is replaced in the returned
989 + string by the upper-case letter specified by the mapping. Other
990 + characters in s are unchanged in the returned string.
991 +
992 +
993 +
994 + All of the preceding functions that take ERE as a parameter expect a
995 + pattern or a string valued expression that is a regular expression as
996 + defined below.
997 +
998 +
999 + Input/Output and General Functions
1000 + The input/output and general functions are:
1001 +
1002 + close(expression)
1003 + Close the file or pipe opened by a print or
1004 + printf statement or a call to getline with
1005 + the same string-valued expression. If the
1006 + close was successful, the function returns
1007 + 0; otherwise, it returns non-zero.
1008 +
1009 +
1010 + fflush(expression)
1011 + Flush any buffered output for the file or
1012 + pipe opened by a print or printf statement
1013 + or a call to getline with the same string-
1014 + valued expression. If the flush was
1015 + successful, the function returns 0;
1016 + otherwise, it returns EOF. If no arguments
1017 + or the empty string ("") are given, then all
1018 + open files will be flushed. (Note that
1019 + fflush is supported in /usr/bin/awk only.)
1020 +
1021 +
1022 + expression|getline[var]
1023 + Read a record of input from a stream piped
1024 + from the output of a command. The stream is
1025 + created if no stream is currently open with
1026 + the value of expression as its command name.
1027 + The stream created is equivalent to one
1028 + created by a call to the popen function with
1029 + the value of expression as the command
1030 + argument and a value of r as the mode
1031 + argument. As long as the stream remains
1032 + open, subsequent calls in which expression
1033 + evaluates to the same string value reads
1034 + subsequent records from the file. The stream
1035 + remains open until the close function is
1036 + called with an expression that evaluates to
1037 + the same string value. At that time, the
1038 + stream is closed as if by a call to the
1039 + pclose function. If var is missing, $0 and
1040 + NF is set. Otherwise, var is set.
1041 +
1042 + The getline operator can form ambiguous
1043 + constructs when there are operators that are
1044 + not in parentheses (including concatenate)
1045 + to the left of the | (to the beginning of
1046 + the expression containing getline). In the
1047 + context of the $ operator, | behaves as if
1048 + it had a lower precedence than $. The result
1049 + of evaluating other operators is
1050 + unspecified, and all such uses of portable
1051 + applications must be put in parentheses
1052 + properly.
1053 +
1054 +
257 1055 getline
258 - Set $0 to the next input record from the current input file.
259 - getline returns 1 for successful input, 0 for end of file,
260 - and -1 for an error.
1056 + Set $0 to the next input record from the
1057 + current input file. This form of getline
1058 + sets the NF, NR, and FNR variables.
261 1059
262 1060
263 - Large File Behavior
1061 + getline var
1062 + Set variable var to the next input record
1063 + from the current input file. This form of
1064 + getline sets the FNR and NR variables.
1065 +
1066 +
1067 + getline [var] < expression
1068 + Read the next record of input from a named
1069 + file. The expression is evaluated to produce
1070 + a string that is used as a full pathname. If
1071 + the file of that name is not currently open,
1072 + it is opened. As long as the stream remains
1073 + open, subsequent calls in which expression
1074 + evaluates to the same string value reads
1075 + subsequent records from the file. The file
1076 + remains open until the close function is
1077 + called with an expression that evaluates to
1078 + the same string value. If var is missing, $0
1079 + and NF is set. Otherwise, var is set.
1080 +
1081 + The getline operator can form ambiguous
1082 + constructs when there are binary operators
1083 + that are not in parentheses (including
1084 + concatenate) to the right of the < (up to
1085 + the end of the expression containing the
1086 + getline). The result of evaluating such a
1087 + construct is unspecified, and all such uses
1088 + of portable applications must be put in
1089 + parentheses properly.
1090 +
1091 +
1092 + system(expression)
1093 + Execute the command given by expression in a
1094 + manner equivalent to the system(3C) function
1095 + and return the exit status of the command.
1096 +
1097 +
1098 +
1099 + All forms of getline return 1 for successful input, 0 for end of file,
1100 + and -1 for an error.
1101 +
1102 +
1103 + Where strings are used as the name of a file or pipeline, the strings
1104 + must be textually identical. The terminology ``same string value''
1105 + implies that ``equivalent strings'', even those that differ only by
1106 + space characters, represent different files.
1107 +
1108 +
1109 + User-defined Functions
1110 + The awk language also provides user-defined functions. Such functions
1111 + can be defined as:
1112 +
1113 + function name(args,...) { statements }
1114 +
1115 +
1116 +
1117 + A function can be referred to anywhere in an awk program; in
1118 + particular, its use can precede its definition. The scope of a function
1119 + is global.
1120 +
1121 +
1122 + Function arguments can be either scalars or arrays; the behavior is
1123 + undefined if an array name is passed as an argument that the function
1124 + uses as a scalar, or if a scalar expression is passed as an argument
1125 + that the function uses as an array. Function arguments are passed by
1126 + value if scalar and by reference if array name. Argument names are
1127 + local to the function; all other variable names are global. The same
1128 + name is not used as both an argument name and as the name of a function
1129 + or a special awk variable. The same name must not be used both as a
1130 + variable name with global scope and as the name of a function. The same
1131 + name must not be used within the same scope both as a scalar variable
1132 + and as an array.
1133 +
1134 +
1135 + The number of parameters in the function definition need not match the
1136 + number of parameters in the function call. Excess formal parameters can
1137 + be used as local variables. If fewer arguments are supplied in a
1138 + function call than are in the function definition, the extra parameters
1139 + that are used in the function body as scalars are initialized with a
1140 + string value of the null string and a numeric value of zero, and the
1141 + extra parameters that are used in the function body as arrays are
1142 + initialized as empty arrays. If more arguments are supplied in a
1143 + function call than are in the function definition, the behavior is
1144 + undefined.
1145 +
1146 +
1147 + When invoking a function, no white space can be placed between the
1148 + function name and the opening parenthesis. Function calls can be nested
1149 + and recursive calls can be made upon functions. Upon return from any
1150 + nested or recursive function call, the values of all of the calling
1151 + function's parameters are unchanged, except for array parameters passed
1152 + by reference. The return statement can be used to return a value. If a
1153 + return statement appears outside of a function definition, the behavior
1154 + is undefined.
1155 +
1156 +
1157 + In the function definition, newline characters are optional before the
1158 + opening brace and after the closing brace. Function definitions can
1159 + appear anywhere in the program where a pattern-action pair is allowed.
1160 +
1161 +
1162 +USAGE
1163 + The index, length, match, and substr functions should not be confused
1164 + with similar functions in the ISO C standard; the awk versions deal
1165 + with characters, while the ISO C standard deals with bytes.
1166 +
1167 +
1168 + Because the concatenation operation is represented by adjacent
1169 + expressions rather than an explicit operator, it is often necessary to
1170 + use parentheses to enforce the proper evaluation precedence.
1171 +
1172 +
264 1173 See largefile(5) for the description of the behavior of awk when
265 - encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
1174 + encountering files greater than or equal to 2 Gbyte (2^31 bytes).
266 1175
1176 +
267 1177 EXAMPLES
268 - Example 1 Printing Lines Longer Than 72 Characters
1178 + The awk program specified in the command line is most easily specified
1179 + within single-quotes (for example, 'program') for applications using
1180 + sh, because awk programs commonly contain characters that are special
1181 + to the shell, including double-quotes. In the cases where a awk program
1182 + contains single-quote characters, it is usually easiest to specify most
1183 + of the program as strings within single-quotes concatenated by the
1184 + shell with quoted single-quote characters. For example:
269 1185
1186 + awk '/'\''/ { print "quote:", $0 }'
270 1187
271 - The following example is an awk script that can be executed by an awk
272 - -f examplescript style command. It prints lines longer than seventy two
273 - characters:
274 1188
275 1189
276 - length > 72
1190 + prints all lines from the standard input containing a single-quote
1191 + character, prefixed with quote:.
277 1192
278 1193
1194 + The following are examples of simple awk programs:
279 1195
280 - Example 2 Printing Fields in Opposite Order
1196 + Example 1 Write to the standard output all input lines for which field
1197 + 3 is greater than 5:
281 1198
1199 + $3 > 5
282 1200
283 - The following example is an awk script that can be executed by an awk
284 - -f examplescript style command. It prints the first two fields in
285 - opposite order:
286 1201
287 1202
288 - { print $2, $1 }
1203 + Example 2 Write every tenth line:
289 1204
1205 + (NR % 10) == 0
290 1206
291 1207
292 - Example 3 Printing Fields in Opposite Order with the Input Fields
293 - Separated
294 1208
1209 + Example 3 Write any line with a substring matching the regular
1210 + expression:
295 1211
296 - The following example is an awk script that can be executed by an awk
297 - -f examplescript style command. It prints the first two input fields in
298 - opposite order, separated by a comma, blanks or tabs:
1212 + /(G|D)(2[0-9][[:alpha:]]*)/
299 1213
300 1214
301 - BEGIN { FS = ",[ \t]*|[ \t]+" }
302 - { print $2, $1 }
303 1215
1216 + Example 4 Print any line with a substring containing a G or D, followed
1217 + by a sequence of digits and characters:
304 1218
305 1219
306 - Example 4 Adding Up the First Column, Printing the Sum and Average
1220 + This example uses character classes digit and alpha to match language-
1221 + independent digit and alphabetic characters, respectively.
307 1222
308 1223
309 - The following example is an awk script that can be executed by an awk
310 - -f examplescript style command. It adds up the first column, and
311 - prints the sum and average:
1224 + /(G|D)([[:digit:][:alpha:]]*)/
312 1225
313 1226
314 - { s += $1 }
315 - END { print "sum is", s, " average is", s/NR }
316 1227
1228 + Example 5 Write any line in which the second field matches the regular
1229 + expression and the fourth field does not:
317 1230
1231 + $2 ~ /xyz/ && $4 !~ /xyz/
318 1232
319 - Example 5 Printing Fields in Reverse Order
320 1233
321 1234
322 - The following example is an awk script that can be executed by an awk
323 - -f examplescript style command. It prints fields in reverse order:
1235 + Example 6 Write any line in which the second field contains a
1236 + backslash:
324 1237
1238 + $2 ~ /\\/
325 1239
326 - { for (i = NF; i > 0; --i) print $i }
327 1240
328 1241
1242 + Example 7 Write any line in which the second field contains a backslash
1243 + (alternate method):
329 1244
330 - Example 6 Printing All lines Between start/stop Pairs
331 1245
1246 + Notice that backslash escapes are interpreted twice, once in lexical
1247 + processing of the string and once in processing the regular expression.
332 1248
333 - The following example is an awk script that can be executed by an awk
334 - -f examplescript style command. It prints all lines between start/stop
335 - pairs.
336 1249
1250 + $2 ~ "\\\\"
337 1251
338 - /start/, /stop/
339 1252
340 1253
1254 + Example 8 Write the second to the last and the last field in each line,
1255 + separating the fields by a colon:
341 1256
342 - Example 7 Printing All Lines Whose First Field is Different from the
343 - Previous One
1257 + {OFS=":";print $(NF-1), $NF}
344 1258
345 1259
346 - The following example is an awk script that can be executed by an awk
347 - -f examplescript style command. It prints all lines whose first field
348 - is different from the previous one.
349 1260
1261 + Example 9 Write the line number and number of fields in each line:
350 1262
1263 +
1264 + The three strings representing the line number, the colon and the
1265 + number of fields are concatenated and that string is written to
1266 + standard output.
1267 +
1268 +
1269 + {print NR ":" NF}
1270 +
1271 +
1272 +
1273 + Example 10 Write lines longer than 72 characters:
1274 +
1275 + {length($0) > 72}
1276 +
1277 +
1278 +
1279 + Example 11 Write first two fields in opposite order separated by the
1280 + OFS:
1281 +
1282 + { print $2, $1 }
1283 +
1284 +
1285 +
1286 + Example 12 Same, with input fields separated by comma or space and tab
1287 + characters, or both:
1288 +
1289 + BEGIN { FS = ",[\t]*|[\t]+" }
1290 + { print $2, $1 }
1291 +
1292 +
1293 +
1294 + Example 13 Add up first column, print sum and average:
1295 +
1296 + {s += $1 }
1297 + END {print "sum is ", s, " average is", s/NR}
1298 +
1299 +
1300 +
1301 + Example 14 Write fields in reverse order, one per line (many lines out
1302 + for each line in):
1303 +
1304 + { for (i = NF; i > 0; --i) print $i }
1305 +
1306 +
1307 +
1308 + Example 15 Write all lines between occurrences of the strings "start"
1309 + and "stop":
1310 +
1311 + /start/, /stop/
1312 +
1313 +
1314 +
1315 + Example 16 Write all lines whose first field is different from the
1316 + previous one:
1317 +
351 1318 $1 != prev { print; prev = $1 }
352 1319
353 1320
354 1321
355 - Example 8 Printing a File and Filling in Page numbers
1322 + Example 17 Simulate the echo command:
356 1323
1324 + BEGIN {
1325 + for (i = 1; i < ARGC; ++i)
1326 + printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
1327 + }
357 1328
358 - The following example is an awk script that can be executed by an awk
359 - -f examplescript style command. It prints a file and fills in page
360 - numbers starting at 5:
361 1329
362 1330
363 - /Page/ { $2 = n++; }
364 - { print }
1331 + Example 18 Write the path prefixes contained in the PATH environment
1332 + variable, one per line:
365 1333
1334 + BEGIN {
1335 + n = split (ENVIRON["PATH"], path, ":")
1336 + for (i = 1; i <= n; ++i)
1337 + print path[i]
1338 + }
366 1339
367 1340
368 - Example 9 Printing a File and Numbering Its Pages
369 1341
1342 + Example 19 Print the file "input", filling in page numbers starting at
1343 + 5:
370 1344
371 - Assuming this program is in a file named prog, the following example
372 - prints the file input numbering its pages starting at 5:
373 1345
1346 + If there is a file named input containing page headers of the form
374 1347
375 - example% awk -f prog n=5 input
376 1348
1349 + Page#
377 1350
378 1351
1352 +
1353 + and a file named program that contains
1354 +
1355 +
1356 + /Page/{ $2 = n++; }
1357 + { print }
1358 +
1359 +
1360 +
1361 + then the command line
1362 +
1363 +
1364 + awk -f program n=5 input
1365 +
1366 +
1367 +
1368 +
1369 + prints the file input, filling in page numbers starting at 5.
1370 +
1371 +
379 1372 ENVIRONMENT VARIABLES
380 1373 See environ(5) for descriptions of the following environment variables
381 - that affect the execution of awk: LANG, LC_ALL, LC_COLLATE, LC_CTYPE,
382 - LC_MESSAGES, NLSPATH, and PATH.
1374 + that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.
383 1375
384 1376 LC_NUMERIC
385 1377 Determine the radix character used when interpreting
386 1378 numeric input, performing conversions between numeric and
387 1379 string values and formatting numeric output. Regardless
388 1380 of locale, the period character (the decimal-point
389 1381 character of the POSIX locale) is the decimal-point
390 1382 character recognized in processing awk programs
391 1383 (including assignments in command-line arguments).
392 1384
393 1385
394 -ATTRIBUTES
395 - See attributes(5) for descriptions of the following attributes:
1386 +EXIT STATUS
1387 + The following exit values are returned:
396 1388
397 - /usr/bin/awk
1389 + 0
1390 + All input files were processed successfully.
398 1391
399 1392
1393 + >0
1394 + An error occurred.
400 1395
401 - +---------------+-----------------+
402 - |ATTRIBUTE TYPE | ATTRIBUTE VALUE |
403 - +---------------+-----------------+
404 - |CSI | Not Enabled |
405 - +---------------+-----------------+
406 1396
407 - /usr/xpg4/bin/awk
408 1397
1398 + The exit status can be altered within the program by using an exit
1399 + expression.
409 1400
410 1401
411 - +--------------------+-----------------+
412 - | ATTRIBUTE TYPE | ATTRIBUTE VALUE |
413 - +--------------------+-----------------+
414 - |CSI | Enabled |
415 - +--------------------+-----------------+
416 - |Interface Stability | Standard |
417 - +--------------------+-----------------+
418 -
419 1402 SEE ALSO
420 - egrep(1), grep(1), nawk(1), sed(1), printf(3C), attributes(5),
421 - environ(5), largefile(5), standards(5)
1403 + ed(1), egrep(1), grep(1), lex(1), oawk(1), sed(1), popen(3C),
1404 + printf(3C), system(3C), attributes(5), environ(5), largefile(5),
1405 + regex(5), XPG4(5)
422 1406
1407 +
1408 + Aho, A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming
1409 + Language, Addison-Wesley, 1988.
1410 +
1411 +
1412 +DIAGNOSTICS
1413 + If any file operand is specified and the named file cannot be accessed,
1414 + awk writes a diagnostic message to standard error and terminate without
1415 + any further action.
1416 +
1417 +
1418 + If the program specified by either the program operand or a progfile
1419 + operand is not a valid awk program (as specified in EXTENDED
1420 + DESCRIPTION), the behavior is undefined.
1421 +
1422 +
423 1423 NOTES
424 1424 Input white space is not preserved on output if fields are involved.
425 1425
426 1426
427 1427 There are no explicit conversions between numbers and strings. To force
428 - an expression to be treated as a number, add 0 to it. To force an
429 - expression to be treated as a string, concatenate the null string ("")
430 - to it.
1428 + an expression to be treated as a number add 0 to it; to force it to be
1429 + treated as a string concatenate the null string ("") to it.
431 1430
432 1431
433 1432
434 - June 22, 2005 AWK(1)
1433 + April 20, 2020 AWK(1)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX