1 AWK(1) User Commands AWK(1)
2
3
4
5 NAME
6 awk - pattern scanning and processing language
7
8 SYNOPSIS
9 /usr/bin/awk [-F ERE] [-v assignment] 'program' | -f progfile...
10 [argument]...
11
12
13 /usr/bin/nawk [-F ERE] [-v assignment] 'program' | -f progfile...
14 [argument]...
15
16
17 /usr/xpg4/bin/awk [-F ERE] [-v assignment]... 'program' | -f progfile...
18 [argument]...
19
20
21 DESCRIPTION
22 NOTE: The nawk command is now the system default awk for illumos.
23
24 The /usr/bin/awk and /usr/xpg4/bin/awk utilities execute programs
25 written in the awk programming language, which is specialized for
26 textual data manipulation. A awk program is a sequence of patterns and
27 corresponding actions. The string specifying program must be enclosed
28 in single quotes (') to protect it from interpretation by the shell.
29 The sequence of pattern - action statements can be specified in the
30 command line as program or in one, or more, file(s) specified by the
31 -fprogfile option. When input is read that matches a pattern, the
32 action associated with the pattern is performed.
33
34
35 Input is interpreted as a sequence of records. By default, a record is
36 a line, but this can be changed by using the RS built-in variable. Each
37 record of input is matched to each pattern in the program. For each
38 pattern matched, the associated action is executed.
39
40
41 The awk utility interprets each input record as a sequence of fields
42 where, by default, a field is a string of non-blank characters. This
43 default white-space field delimiter (blanks and/or tabs) can be changed
44 by using the FS built-in variable or the -FERE option. The awk utility
45 denotes the first field in a record $1, the second $2, and so forth.
46 The symbol $0 refers to the entire record; setting any other field
47 causes the reevaluation of $0. Assigning to $0 resets the values of all
48 fields and the NF built-in variable.
49
50
51 OPTIONS
52 The following options are supported:
53
54 -F ERE
55 Define the input field separator to be the extended
56 regular expression ERE, before any input is read (can
57 be a character).
58
59
60 -f progfile
61 Specifies the pathname of the file progfile containing
62 a awk program. If multiple instances of this option
63 are specified, the concatenation of the files
64 specified as progfile in the order specified is the
65 awk program. The awk program can alternatively be
66 specified in the command line as a single argument.
67
68
69 -v assignment
70 The assignment argument must be in the same form as an
71 assignment operand. The assignment is of the form
72 var=value, where var is the name of one of the
73 variables described below. The specified assignment
74 occurs before executing the awk program, including the
75 actions associated with BEGIN patterns (if any).
76 Multiple occurrences of this option can be specified.
77
78
79 -safe
80 When passed to awk, this flag will prevent the program
81 from opening new files or running child processes. The
82 ENVIRON array will also not be initialized.
83
84
85 OPERANDS
86 The following operands are supported:
87
88 program
89 If no -f option is specified, the first operand to awk is
90 the text of the awk program. The application supplies the
91 program operand as a single argument to awk. If the text
92 does not end in a newline character, awk interprets the
93 text as if it did.
94
95
96 argument
97 Either of the following two types of argument can be
98 intermixed:
99
100 file
101 A pathname of a file that contains the input
102 to be read, which is matched against the set
103 of patterns in the program. If no file
104 operands are specified, or if a file operand
105 is -, the standard input is used.
106
107
108 assignment
109 An operand that begins with an underscore or
110 alphabetic character from the portable
111 character set, followed by a sequence of
112 underscores, digits and alphabetics from the
113 portable character set, followed by the =
114 character specifies a variable assignment
115 rather than a pathname. The characters before
116 the = represent the name of a awk variable.
117 If that name is a awk reserved word, the
118 behavior is undefined. The characters
119 following the equal sign is interpreted as if
120 they appeared in the awk program preceded and
121 followed by a double-quote (") character, as
122 a STRING token , except that if the last
123 character is an unescaped backslash, it is
124 interpreted as a literal backslash rather
125 than as the first character of the sequence
126 \.. The variable is assigned the value of
127 that STRING token. If the value is considered
128 a numericstring, the variable is assigned its
129 numeric value. Each such variable assignment
130 is performed just before the processing of
131 the following file, if any. Thus, an
132 assignment before the first file argument is
133 executed after the BEGIN actions (if any),
134 while an assignment after the last file
135 argument is executed before the END actions
136 (if any). If there are no file arguments,
137 assignments are executed before processing
138 the standard input.
139
140
141
142 INPUT FILES
143 Input files to the awk program from any of the following sources:
144
145 o any file operands or their equivalents, achieved by
146 modifying the awk variables ARGV and ARGC
147
148 o standard input in the absence of any file operands
149
150 o arguments to the getline function
151
152
153 must be text files. Whether the variable RS is set to a value other
154 than a newline character or not, for these files, implementations
155 support records terminated with the specified separator up to
156 {LINE_MAX} bytes and can support longer records.
157
158
159 If -f progfile is specified, the files named by each of the progfile
160 option-arguments must be text files containing an awk program.
161
162
163 The standard input are used only if no file operands are specified, or
164 if a file operand is -.
165
166
167 EXTENDED DESCRIPTION
168 A awk program is composed of pairs of the form:
169
170 pattern { action }
171
172
173
174 Either the pattern or the action (including the enclosing brace
175 characters) can be omitted. Pattern-action statements are separated by
176 a semicolon or by a newline.
177
178
179 A missing pattern matches any record of input, and a missing action is
180 equivalent to an action that writes the matched record of input to
181 standard output.
182
183
184 Execution of the awk program starts by first executing the actions
185 associated with all BEGIN patterns in the order they occur in the
186 program. Then each file operand (or standard input if no files were
187 specified) is processed by reading data from the file until a record
188 separator is seen (a newline character by default), splitting the
189 current record into fields using the current value of FS, evaluating
190 each pattern in the program in the order of occurrence, and executing
191 the action associated with each pattern that matches the current
192 record. The action for a matching pattern is executed before evaluating
193 subsequent patterns. Last, the actions associated with all END patterns
194 is executed in the order they occur in the program.
195
196
197 Expressions in awk
198 Expressions describe computations used in patterns and actions. In the
199 following table, valid expression operations are given in groups from
200 highest precedence first to lowest precedence last, with equal-
201 precedence operators grouped between horizontal lines. In expression
202 evaluation, where the grammar is formally ambiguous, higher precedence
203 operators are evaluated before lower precedence operators. In this
204 table expr, expr1, expr2, and expr3 represent any expression, while
205 lvalue represents any entity that can be assigned to (that is, on the
206 left side of an assignment operator).
207
208
209
210
211 Syntax Name Type of Result Associativity
212 -------------------------------------------------------------------------------
213 ( expr ) Grouping type of expr n/a
214 -------------------------------------------------------------------------------
215 $expr Field reference string n/a
216 -------------------------------------------------------------------------------
217 ++ lvalue Pre-increment numeric n/a
218 -- lvalue Pre-decrement numeric n/a
219 lvalue ++ Post-increment numeric n/a
220 lvalue -- Post-decrement numeric n/a
221 -------------------------------------------------------------------------------
222 expr ^ expr Exponentiation numeric right
223 -------------------------------------------------------------------------------
224 ! expr Logical not numeric n/a
225 + expr Unary plus numeric n/a
226 - expr Unary minus numeric n/a
227 -------------------------------------------------------------------------------
228 expr * expr Multiplication numeric left
229 expr / expr Division numeric left
230 expr % expr Modulus numeric left
231 -------------------------------------------------------------------------------
232 expr + expr Addition numeric left
233 expr - expr Subtraction numeric left
234 -------------------------------------------------------------------------------
235 expr expr String concatenation string left
236 -------------------------------------------------------------------------------
237 expr < expr Less than numeric none
238 expr <= expr Less than or equal to numeric none
239 expr != expr Not equal to numeric none
240 expr == expr Equal to numeric none
241 expr > expr Greater than numeric none
242 expr >= expr Greater than or equal to numeric none
243 -------------------------------------------------------------------------------
244 expr ~ expr ERE match numeric none
245 expr !~ expr ERE non-match numeric none
246 -------------------------------------------------------------------------------
247 expr in array Array membership numeric left
248 ( index ) in Multi-dimension array numeric left
249 array membership
250 -------------------------------------------------------------------------------
251 expr && expr Logical AND numeric left
252 -------------------------------------------------------------------------------
253 expr || expr Logical OR numeric left
254 -------------------------------------------------------------------------------
255 expr1 ? expr2 Conditional expression type of selected right
256 : expr3 expr2 or expr3
257 -------------------------------------------------------------------------------
258 lvalue ^= expr Exponentiation numeric right
259 assignment
260 lvalue %= expr Modulus assignment numeric right
261 lvalue *= expr Multiplication numeric right
262 assignment
263 lvalue /= expr Division assignment numeric right
264 lvalue += expr Addition assignment numeric right
265 lvalue -= expr Subtraction assignment numeric right
266 lvalue = expr Assignment type of expr right
267
268
269
270 Each expression has either a string value, a numeric value or both.
271 Except as stated for specific contexts, the value of an expression is
272 implicitly converted to the type needed for the context in which it is
273 used. A string value is converted to a numeric value by the equivalent
274 of the following calls:
275
276 setlocale(LC_NUMERIC, "");
277 numeric_value = atof(string_value);
278
279
280
281 A numeric value that is exactly equal to the value of an integer is
282 converted to a string by the equivalent of a call to the sprintf
283 function with the string %d as the fmt argument and the numeric value
284 being converted as the first and only expr argument. Any other numeric
285 value is converted to a string by the equivalent of a call to the
286 sprintf function with the value of the variable CONVFMT as the fmt
287 argument and the numeric value being converted as the first and only
288 expr argument.
289
290
291 A string value is considered to be a numeric string in the following
292 case:
293
294 1. Any leading and trailing blank characters is ignored.
295
296 2. If the first unignored character is a + or -, it is ignored.
297
298 3. If the remaining unignored characters would be lexically
299 recognized as a NUMBER token, the string is considered a
300 numeric string.
301
302
303 If a - character is ignored in the above steps, the numeric value of
304 the numeric string is the negation of the numeric value of the
305 recognized NUMBER token. Otherwise the numeric value of the numeric
306 string is the numeric value of the recognized NUMBER token. Whether or
307 not a string is a numeric string is relevant only in contexts where
308 that term is used in this section.
309
310
311 When an expression is used in a Boolean context, if it has a numeric
312 value, a value of zero is treated as false and any other value is
313 treated as true. Otherwise, a string value of the null string is
314 treated as false and any other value is treated as true. A Boolean
315 context is one of the following:
316
317 o the first subexpression of a conditional expression.
318
319 o an expression operated on by logical NOT, logical AND, or
320 logical OR.
321
322 o the second expression of a for statement.
323
324 o the expression of an if statement.
325
326 o the expression of the while clause in either a while or do
327 ... while statement.
328
329 o an expression used as a pattern (as in Overall Program
330 Structure).
331
332
333 The awk language supplies arrays that are used for storing numbers or
334 strings. Arrays need not be declared. They are initially empty, and
335 their sizes changes dynamically. The subscripts, or element
336 identifiers, are strings, providing a type of associative array
337 capability. An array name followed by a subscript within square
338 brackets can be used as an lvalue and as an expression, as described in
339 the grammar. Unsubscripted array names are used in only the following
340 contexts:
341
342 o a parameter in a function definition or function call.
343
344 o the NAME token following any use of the keyword in.
345
346
347 A valid array index consists of one or more comma-separated
348 expressions, similar to the way in which multi-dimensional arrays are
349 indexed in some programming languages. Because awk arrays are really
350 one-dimensional, such a comma-separated list is converted to a single
351 string by concatenating the string values of the separate expressions,
352 each separated from the other by the value of the SUBSEP variable.
353
354
355 Thus, the following two index operations are equivalent:
356
357 var[expr1, expr2, ... exprn]
358 var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
359
360
361
362 A multi-dimensioned index used with the in operator must be put in
363 parentheses. The in operator, which tests for the existence of a
364 particular array element, does not create the element if it does not
365 exist. Any other reference to a non-existent array element
366 automatically creates it.
367
368
369 Variables and Special Variables
370 Variables can be used in an awk program by referencing them. With the
371 exception of function parameters, they are not explicitly declared.
372 Uninitialized scalar variables and array elements have both a numeric
373 value of zero and a string value of the empty string.
374
375
376 Field variables are designated by a $ followed by a number or numerical
377 expression. The effect of the field number expression evaluating to
378 anything other than a non-negative integer is unspecified.
379 Uninitialized variables or string values need not be converted to
380 numeric values in this context. New field variables are created by
381 assigning a value to them. References to non-existent fields (that is,
382 fields after $NF) produce the null string. However, assigning to a non-
383 existent field (for example, $(NF+2) = 5) increases the value of NF,
384 create any intervening fields with the null string as their values and
385 cause the value of $0 to be recomputed, with the fields being separated
386 by the value of OFS. Each field variable has a string value when
387 created. If the string, with any occurrence of the decimal-point
388 character from the current locale changed to a period character, is
389 considered a numeric string (see Expressions in awk above), the field
390 variable also has the numeric value of the numeric string.
391
392
393 /usr/bin/awk, /usr/xpg4/bin/awk
394 awk sets the following special variables that are supported by both
395 /usr/bin/awk and /usr/xpg4/bin/awk:
396
397 ARGC
398 The number of elements in the ARGV array.
399
400
401 ARGV
402 An array of command line arguments, excluding options and
403 the program argument, numbered from zero to ARGC-1.
404
405 The arguments in ARGV can be modified or added to; ARGC can
406 be altered. As each input file ends, awk treats the next
407 non-null element of ARGV, up to the current value of
408 ARGC-1, inclusive, as the name of the next input file.
409 Setting an element of ARGV to null means that it is not
410 treated as an input file. The name - indicates the standard
411 input. If an argument matches the format of an assignment
412 operand, this argument is treated as an assignment rather
413 than a file argument.
414
415
416 CONVFMT
417 The printf format for converting numbers to strings (except
418 for output statements, where OFMT is used). The default is
419 %.6g.
420
421
422 ENVIRON
423 The variable ENVIRON is an array representing the value of
424 the environment. The indices of the array are strings
425 consisting of the names of the environment variables, and
426 the value of each array element is a string consisting of
427 the value of that variable. If the value of an environment
428 variable is considered a numeric string, the array element
429 also has its numeric value.
430
431 In all cases where awk behavior is affected by environment
432 variables (including the environment of any commands that
433 awk executes via the system function or via pipeline
434 redirections with the print statement, the printf
435 statement, or the getline function), the environment used
436 is the environment at the time awk began executing.
437
438
439 FILENAME
440 A pathname of the current input file. Inside a BEGIN action
441 the value is undefined. Inside an END action the value is
442 the name of the last input file processed.
443
444
445 FNR
446 The ordinal number of the current record in the current
447 file. Inside a BEGIN action the value is zero. Inside an
448 END action the value is the number of the last record
449 processed in the last file processed.
450
451
452 FS
453 Input field separator regular expression; a space character
454 by default.
455
456
457 NF
458 The number of fields in the current record. Inside a BEGIN
459 action, the use of NF is undefined unless a getline
460 function without a var argument is executed previously.
461 Inside an END action, NF retains the value it had for the
462 last record read, unless a subsequent, redirected, getline
463 function without a var argument is performed prior to
464 entering the END action.
465
466
467 NR
468 The ordinal number of the current record from the start of
469 input. Inside a BEGIN action the value is zero. Inside an
470 END action the value is the number of the last record
471 processed.
472
473
474 OFMT
475 The printf format for converting numbers to strings in
476 output statements "%.6g" by default. The result of the
477 conversion is unspecified if the value of OFMT is not a
478 floating-point format specification.
479
480
481 OFS
482 The print statement output field separator; a space
483 character by default.
484
485
486 ORS
487 The print output record separator; a newline character by
488 default.
489
490
491 RLENGTH
492 The length of the string matched by the match function.
493
494
495 RS
496 The first character of the string value of RS is the input
497 record separator; a newline character by default. If RS
498 contains more than one character, the results are
499 unspecified. If RS is null, then records are separated by
500 sequences of one or more blank lines. Leading or trailing
501 blank lines do not produce empty records at the beginning
502 or end of input, and the field separator is always newline,
503 no matter what the value of FS.
504
505
506 RSTART
507 The starting position of the string matched by the match
508 function, numbering from 1. This is always equivalent to
509 the return value of the match function.
510
511
512 SUBSEP
513 The subscript separator string for multi-dimensional
514 arrays. The default value is \034.
515
516
517 /usr/bin/awk
518 The following variable is supported for /usr/bin/awk only:
519
520 RT
521 The record terminator for the most recent record read. For
522 most records this will be the same value as RS. At the end
523 of a file with no trailing separator value, though, this
524 will be set to the empty string ("").
525
526
527 Regular Expressions
528 The awk utility makes use of the extended regular expression notation
529 (see regex(5)) except that it allows the use of C-language conventions
530 to escape special characters within the EREs, namely \\, \a, \b, \f,
531 \n, \r, \t, \v, and those specified in the following table. These
532 escape sequences are recognized both inside and outside bracket
533 expressions. Note that records need not be separated by newline
534 characters and string constants can contain newline characters, so even
535 the \n sequence is valid in awk EREs. Using a slash character within
536 the regular expression requires escaping as shown in the table below:
537
538
539
540
541 Escape Sequence Description Meaning
542 ----------------------------------------------------------------------
543 \" Backslash quotation-mark Quotation-mark character
544 ----------------------------------------------------------------------
545 \/ Backslash slash Slash character
546 ----------------------------------------------------------------------
547 \ddd A backslash character The character encoded by
548 followed by the longest the one-, two- or
549 sequence of one, two, or three-digit octal
550 three octal-digit integer. Multi-byte
551 characters (01234567). characters require
552 If all of the digits are multiple, concatenated
553 0, (that is, escape sequences,
554 representation of the including the leading \
555 NULL character), the for each byte.
556 behavior is undefined.
557 ----------------------------------------------------------------------
558 \c A backslash character Undefined
559 followed by any
560 character not described
561 in this table or special
562 characters (\\, \a, \b,
563 \f, \n, \r, \t, \v).
564
565
566
567 A regular expression can be matched against a specific field or string
568 by using one of the two regular expression matching operators, ~ and
569 !~. These operators interpret their right-hand operand as a regular
570 expression and their left-hand operand as a string. If the regular
571 expression matches the string, the ~ expression evaluates to the value
572 1, and the !~ expression evaluates to the value 0. If the regular
573 expression does not match the string, the ~ expression evaluates to the
574 value 0, and the !~ expression evaluates to the value 1. If the right-
575 hand operand is any expression other than the lexical token ERE, the
576 string value of the expression is interpreted as an extended regular
577 expression, including the escape conventions described above. Notice
578 that these same escape conventions also are applied in the determining
579 the value of a string literal (the lexical token STRING), and is
580 applied a second time when a string literal is used in this context.
581
582
583 When an ERE token appears as an expression in any context other than as
584 the right-hand of the ~ or !~ operator or as one of the built-in
585 function arguments described below, the value of the resulting
586 expression is the equivalent of:
587
588 $0 ~ /ere/
589
590
591
592 The ere argument to the gsub, match, sub functions, and the fs argument
593 to the split function (see String Functions) is interpreted as extended
594 regular expressions. These can be either ERE tokens or arbitrary
595 expressions, and are interpreted in the same manner as the right-hand
596 side of the ~ or !~ operator.
597
598
599 An extended regular expression can be used to separate fields by using
600 the -F ERE option or by assigning a string containing the expression to
601 the built-in variable FS. The default value of the FS variable is a
602 single space character. The following describes FS behavior:
603
604 1. If FS is a single character:
605
606 o If FS is the space character, skip leading and trailing
607 blank characters; fields are delimited by sets of one or
608 more blank characters.
609
610 o Otherwise, if FS is any other character c, fields are
611 delimited by each single occurrence of c.
612
613 2. Otherwise, the string value of FS is considered to be an
614 extended regular expression. Each occurrence of a sequence
615 matching the extended regular expression delimits fields.
616
617
618 Except in the gsub, match, split, and sub built-in functions, regular
619 expression matching is based on input records. That is, record
620 separator characters (the first character of the value of the variable
621 RS, a newline character by default) cannot be embedded in the
622 expression, and no expression matches the record separator character.
623 If the record separator is not a newline character, newline characters
624 embedded in the expression can be matched. In those four built-in
625 functions, regular expression matching are based on text strings. So,
626 any character (including the newline character and the record
627 separator) can be embedded in the pattern and an appropriate pattern
628 matches any character. However, in all awk regular expression matching,
629 the use of one or more NULL characters in the pattern, input record or
630 text string produces undefined results.
631
632
633 Patterns
634 A pattern is any valid expression, a range specified by two expressions
635 separated by comma, or one of the two special patterns BEGIN or END.
636
637
638 Special Patterns
639 The awk utility recognizes two special patterns, BEGIN and END. Each
640 BEGIN pattern is matched once and its associated action executed before
641 the first record of input is read (except possibly by use of the
642 getline function in a prior BEGIN action) and before command line
643 assignment is done. Each END pattern is matched once and its associated
644 action executed after the last record of input has been read. These two
645 patterns have associated actions.
646
647
648 BEGIN and END do not combine with other patterns. Multiple BEGIN and
649 END patterns are allowed. The actions associated with the BEGIN
650 patterns are executed in the order specified in the program, as are the
651 END actions. An END pattern can precede a BEGIN pattern in a program.
652
653
654 If an awk program consists of only actions with the pattern BEGIN, and
655 the BEGIN action contains no getline function, awk exits without
656 reading its input when the last statement in the last BEGIN action is
657 executed. If an awk program consists of only actions with the pattern
658 END or only actions with the patterns BEGIN and END, the input is read
659 before the statements in the END actions are executed.
660
661
662 Expression Patterns
663 An expression pattern is evaluated as if it were an expression in a
664 Boolean context. If the result is true, the pattern is considered to
665 match, and the associated action (if any) is executed. If the result is
666 false, the action is not executed.
667
668
669 Pattern Ranges
670 A pattern range consists of two expressions separated by a comma. In
671 this case, the action is performed for all records between a match of
672 the first expression and the following match of the second expression,
673 inclusive. At this point, the pattern range can be repeated starting at
674 input records subsequent to the end of the matched range.
675
676
677 Actions
678 An action is a sequence of statements. A statement can be one of the
679 following:
680
681 if ( expression ) statement [ else statement ]
682 while ( expression ) statement
683 do statement while ( expression )
684 for ( expression ; expression ; expression ) statement
685 for ( var in array ) statement
686 delete array[subscript] #delete an array element
687 delete array #delete all elements within an array
688 break
689 continue
690 { [ statement ] ... }
691 expression # commonly variable = expression
692 print [ expression-list ] [ >expression ]
693 printf format [ ,expression-list ] [ >expression ]
694 next # skip remaining patterns on this input line
695 nextfile # skip remaining patterns on this input file
696 exit [expr] # skip the rest of the input; exit status is expr
697 return [expr]
698
699
700
701 Any single statement can be replaced by a statement list enclosed in
702 braces. The statements are terminated by newline characters or
703 semicolons, and are executed sequentially in the order that they
704 appear.
705
706
707 The next statement causes all further processing of the current input
708 record to be abandoned. The behavior is undefined if a next statement
709 appears or is invoked in a BEGIN or END action.
710
711
712 The nextfile statement is similar to next, but also skips all other
713 records in the current file, and moves on to processing the next input
714 file if available (or exits the program if there are none). (Note that
715 this keyword is not supported by /usr/xpg4/bin/awk.)
716
717
718 The exit statement invokes all END actions in the order in which they
719 occur in the program source and then terminate the program without
720 reading further input. An exit statement inside an END action
721 terminates the program without further execution of END actions. If an
722 expression is specified in an exit statement, its numeric value is the
723 exit status of awk, unless subsequent errors are encountered or a
724 subsequent exit statement with an expression is executed.
725
726
727 Output Statements
728 Both print and printf statements write to standard output by default.
729 The output is written to the location specified by output_redirection
730 if one is supplied, as follows:
731
732 > expression>> expression| expression
733
734
735
736 In all cases, the expression is evaluated to produce a string that is
737 used as a full pathname to write into (for > or >>) or as a command to
738 be executed (for |). Using the first two forms, if the file of that
739 name is not currently open, it is opened, creating it if necessary and
740 using the first form, truncating the file. The output then is appended
741 to the file. As long as the file remains open, subsequent calls in
742 which expression evaluates to the same string value simply appends
743 output to the file. The file remains open until the close function,
744 which is called with an expression that evaluates to the same string
745 value.
746
747
748 The third form writes output onto a stream piped to the input of a
749 command. The stream is created if no stream is currently open with the
750 value of expression as its command name. The stream created is
751 equivalent to one created by a call to the popen(3C) function with the
752 value of expression as the command argument and a value of w as the
753 mode argument. As long as the stream remains open, subsequent calls in
754 which expression evaluates to the same string value writes output to
755 the existing stream. The stream remains open until the close function
756 is called with an expression that evaluates to the same string value.
757 At that time, the stream is closed as if by a call to the pclose
758 function.
759
760
761 These output statements take a comma-separated list of expression s
762 referred in the grammar by the non-terminal symbols expr_list,
763 print_expr_list or print_expr_list_opt. This list is referred to here
764 as the expression list, and each member is referred to as an expression
765 argument.
766
767
768 The print statement writes the value of each expression argument onto
769 the indicated output stream separated by the current output field
770 separator (see variable OFS above), and terminated by the output record
771 separator (see variable ORS above). All expression arguments is taken
772 as strings, being converted if necessary; with the exception that the
773 printf format in OFMT is used instead of the value in CONVFMT. An empty
774 expression list stands for the whole input record ($0).
775
776
777 The printf statement produces output based on a notation similar to the
778 File Format Notation used to describe file formats in this document
779 Output is produced as specified with the first expression argument as
780 the string format and subsequent expression arguments as the strings
781 arg1 to argn, inclusive, with the following exceptions:
782
783 1. The format is an actual character string rather than a
784 graphical representation. Therefore, it cannot contain empty
785 character positions. The space character in the format
786 string, in any context other than a flag of a conversion
787 specification, is treated as an ordinary character that is
788 copied to the output.
789
790 2. If the character set contains a Delta character and that
791 character appears in the format string, it is treated as an
792 ordinary character that is copied to the output.
793
794 3. The escape sequences beginning with a backslash character is
795 treated as sequences of ordinary characters that are copied
796 to the output. Note that these same sequences is interpreted
797 lexically by awk when they appear in literal strings, but
798 they is not treated specially by the printf statement.
799
800 4. A field width or precision can be specified as the *
801 character instead of a digit string. In this case the next
802 argument from the expression list is fetched and its numeric
803 value taken as the field width or precision.
804
805 5. The implementation does not precede or follow output from
806 the d or u conversion specifications with blank characters
807 not specified by the format string.
808
809 6. The implementation does not precede output from the o
810 conversion specification with leading zeros not specified by
811 the format string.
812
813 7. For the c conversion specification: if the argument has a
814 numeric value, the character whose encoding is that value is
815 output. If the value is zero or is not the encoding of any
816 character in the character set, the behavior is undefined.
817 If the argument does not have a numeric value, the first
818 character of the string value is output; if the string does
819 not contain any characters the behavior is undefined.
820
821 8. For each conversion specification that consumes an argument,
822 the next expression argument is evaluated. With the
823 exception of the c conversion, the value is converted to the
824 appropriate type for the conversion specification.
825
826 9. If there are insufficient expression arguments to satisfy
827 all the conversion specifications in the format string, the
828 behavior is undefined.
829
830 10. If any character sequence in the format string begins with a
831 % character, but does not form a valid conversion
832 specification, the behavior is unspecified.
833
834
835 Both print and printf can output at least {LINE_MAX} bytes.
836
837
838 Functions
839 The awk language has a variety of built-in functions: arithmetic,
840 string, input/output and general.
841
842
843 Arithmetic Functions
844 The arithmetic functions, except for int, are based on the ISO C
845 standard. The behavior is undefined in cases where the ISO C standard
846 specifies that an error be returned or that the behavior is undefined.
847 Although the grammar permits built-in functions to appear with no
848 arguments or parentheses, unless the argument or parentheses are
849 indicated as optional in the following list (by displaying them within
850 the [ ] brackets), such use is undefined.
851
852 atan2(y,x)
853 Return arctangent of y/x.
854
855
856 cos(x)
857 Return cosine of x, where x is in radians.
858
859
860 sin(x)
861 Return sine of x, where x is in radians.
862
863
864 exp(x)
865 Return the exponential function of x.
866
867
868 log(x)
869 Return the natural logarithm of x.
870
871
872 sqrt(x)
873 Return the square root of x.
874
875
876 int(x)
877 Truncate its argument to an integer. It is truncated
878 toward 0 when x > 0.
879
880
881 rand()
882 Return a random number n, such that 0 <= n < 1.
883
884
885 srand([expr])
886 Set the seed value for rand to expr or use the time of
887 day if expr is omitted. The previous seed value is
888 returned.
889
890
891 String Functions
892 The string functions in the following list shall be supported. Although
893 the grammar permits built-in functions to appear with no arguments or
894 parentheses, unless the argument or parentheses are indicated as
895 optional in the following list (by displaying them within the [ ]
896 brackets), such use is undefined.
897
898 gsub(ere,repl[,in])
899
900 Behave like sub (see below), except that it replaces all
901 occurrences of the regular expression (like the ed utility global
902 substitute) in $0 or in the in argument, when specified.
903
904
905 index(s,t)
906
907 Return the position, in characters, numbering from 1, in string s
908 where string t first occurs, or zero if it does not occur at all.
909
910
911 length[([v])]
912
913 Given no argument, this function returns the length of the whole
914 record, $0. If given an array as an argument (and using
915 /usr/bin/awk), then this returns the number of elements it
916 contains. Otherwise, this function interprets the argument as a
917 string (performing any needed conversions) and returns its length
918 in characters.
919
920
921 match(s,ere)
922
923 Return the position, in characters, numbering from 1, in string s
924 where the extended regular expression ere occurs, or zero if it
925 does not occur at all. RSTART is set to the starting position
926 (which is the same as the returned value), zero if no match is
927 found; RLENGTH is set to the length of the matched string, -1 if no
928 match is found.
929
930
931 split(s,a[,fs])
932
933 Split the string s into array elements a[1], a[2], ..., a[n], and
934 return n. The separation is done with the extended regular
935 expression fs or with the field separator FS if fs is not given.
936 Each array element has a string value when created. If the string
937 assigned to any array element, with any occurrence of the decimal-
938 point character from the current locale changed to a period
939 character, would be considered a numeric string; the array element
940 also has the numeric value of the numeric string. The effect of a
941 null string as the value of fs is unspecified.
942
943
944 sprintf(fmt,expr,expr,...)
945
946 Format the expressions according to the printf format given by fmt
947 and return the resulting string.
948
949
950 sub(ere,repl[,in])
951
952 Substitute the string repl in place of the first instance of the
953 extended regular expression ERE in string in and return the number
954 of substitutions. An ampersand ( & ) appearing in the string repl
955 is replaced by the string from in that matches the regular
956 expression. An ampersand preceded with a backslash ( \ ) is
957 interpreted as the literal ampersand character. An occurrence of
958 two consecutive backslashes is interpreted as just a single literal
959 backslash character. Any other occurrence of a backslash (for
960 example, preceding any other character) is treated as a literal
961 backslash character. If repl is a string literal, the handling of
962 the ampersand character occurs after any lexical processing,
963 including any lexical backslash escape sequence processing. If in
964 is specified and it is not an lvalue the behavior is undefined. If
965 in is omitted, awk uses the current record ($0) in its place.
966
967
968 substr(s,m[,n])
969
970 Return the at most n-character substring of s that begins at
971 position m, numbering from 1. If n is missing, the length of the
972 substring is limited by the length of the string s.
973
974
975 tolower(s)
976
977 Return a string based on the string s. Each character in s that is
978 an upper-case letter specified to have a tolower mapping by the
979 LC_CTYPE category of the current locale is replaced in the returned
980 string by the lower-case letter specified by the mapping. Other
981 characters in s are unchanged in the returned string.
982
983
984 toupper(s)
985
986 Return a string based on the string s. Each character in s that is
987 a lower-case letter specified to have a toupper mapping by the
988 LC_CTYPE category of the current locale is replaced in the returned
989 string by the upper-case letter specified by the mapping. Other
990 characters in s are unchanged in the returned string.
991
992
993
994 All of the preceding functions that take ERE as a parameter expect a
995 pattern or a string valued expression that is a regular expression as
996 defined below.
997
998
999 Input/Output and General Functions
1000 The input/output and general functions are:
1001
1002 close(expression)
1003 Close the file or pipe opened by a print or
1004 printf statement or a call to getline with
1005 the same string-valued expression. If the
1006 close was successful, the function returns
1007 0; otherwise, it returns non-zero.
1008
1009
1010 fflush(expression)
1011 Flush any buffered output for the file or
1012 pipe opened by a print or printf statement
1013 or a call to getline with the same string-
1014 valued expression. If the flush was
1015 successful, the function returns 0;
1016 otherwise, it returns EOF. If no arguments
1017 or the empty string ("") are given, then all
1018 open files will be flushed. (Note that
1019 fflush is supported in /usr/bin/awk only.)
1020
1021
1022 expression|getline[var]
1023 Read a record of input from a stream piped
1024 from the output of a command. The stream is
1025 created if no stream is currently open with
1026 the value of expression as its command name.
1027 The stream created is equivalent to one
1028 created by a call to the popen function with
1029 the value of expression as the command
1030 argument and a value of r as the mode
1031 argument. As long as the stream remains
1032 open, subsequent calls in which expression
1033 evaluates to the same string value reads
1034 subsequent records from the file. The stream
1035 remains open until the close function is
1036 called with an expression that evaluates to
1037 the same string value. At that time, the
1038 stream is closed as if by a call to the
1039 pclose function. If var is missing, $0 and
1040 NF is set. Otherwise, var is set.
1041
1042 The getline operator can form ambiguous
1043 constructs when there are operators that are
1044 not in parentheses (including concatenate)
1045 to the left of the | (to the beginning of
1046 the expression containing getline). In the
1047 context of the $ operator, | behaves as if
1048 it had a lower precedence than $. The result
1049 of evaluating other operators is
1050 unspecified, and all such uses of portable
1051 applications must be put in parentheses
1052 properly.
1053
1054
1055 getline
1056 Set $0 to the next input record from the
1057 current input file. This form of getline
1058 sets the NF, NR, and FNR variables.
1059
1060
1061 getline var
1062 Set variable var to the next input record
1063 from the current input file. This form of
1064 getline sets the FNR and NR variables.
1065
1066
1067 getline [var] < expression
1068 Read the next record of input from a named
1069 file. The expression is evaluated to produce
1070 a string that is used as a full pathname. If
1071 the file of that name is not currently open,
1072 it is opened. As long as the stream remains
1073 open, subsequent calls in which expression
1074 evaluates to the same string value reads
1075 subsequent records from the file. The file
1076 remains open until the close function is
1077 called with an expression that evaluates to
1078 the same string value. If var is missing, $0
1079 and NF is set. Otherwise, var is set.
1080
1081 The getline operator can form ambiguous
1082 constructs when there are binary operators
1083 that are not in parentheses (including
1084 concatenate) to the right of the < (up to
1085 the end of the expression containing the
1086 getline). The result of evaluating such a
1087 construct is unspecified, and all such uses
1088 of portable applications must be put in
1089 parentheses properly.
1090
1091
1092 system(expression)
1093 Execute the command given by expression in a
1094 manner equivalent to the system(3C) function
1095 and return the exit status of the command.
1096
1097
1098
1099 All forms of getline return 1 for successful input, 0 for end of file,
1100 and -1 for an error.
1101
1102
1103 Where strings are used as the name of a file or pipeline, the strings
1104 must be textually identical. The terminology ``same string value''
1105 implies that ``equivalent strings'', even those that differ only by
1106 space characters, represent different files.
1107
1108
1109 User-defined Functions
1110 The awk language also provides user-defined functions. Such functions
1111 can be defined as:
1112
1113 function name(args,...) { statements }
1114
1115
1116
1117 A function can be referred to anywhere in an awk program; in
1118 particular, its use can precede its definition. The scope of a function
1119 is global.
1120
1121
1122 Function arguments can be either scalars or arrays; the behavior is
1123 undefined if an array name is passed as an argument that the function
1124 uses as a scalar, or if a scalar expression is passed as an argument
1125 that the function uses as an array. Function arguments are passed by
1126 value if scalar and by reference if array name. Argument names are
1127 local to the function; all other variable names are global. The same
1128 name is not used as both an argument name and as the name of a function
1129 or a special awk variable. The same name must not be used both as a
1130 variable name with global scope and as the name of a function. The same
1131 name must not be used within the same scope both as a scalar variable
1132 and as an array.
1133
1134
1135 The number of parameters in the function definition need not match the
1136 number of parameters in the function call. Excess formal parameters can
1137 be used as local variables. If fewer arguments are supplied in a
1138 function call than are in the function definition, the extra parameters
1139 that are used in the function body as scalars are initialized with a
1140 string value of the null string and a numeric value of zero, and the
1141 extra parameters that are used in the function body as arrays are
1142 initialized as empty arrays. If more arguments are supplied in a
1143 function call than are in the function definition, the behavior is
1144 undefined.
1145
1146
1147 When invoking a function, no white space can be placed between the
1148 function name and the opening parenthesis. Function calls can be nested
1149 and recursive calls can be made upon functions. Upon return from any
1150 nested or recursive function call, the values of all of the calling
1151 function's parameters are unchanged, except for array parameters passed
1152 by reference. The return statement can be used to return a value. If a
1153 return statement appears outside of a function definition, the behavior
1154 is undefined.
1155
1156
1157 In the function definition, newline characters are optional before the
1158 opening brace and after the closing brace. Function definitions can
1159 appear anywhere in the program where a pattern-action pair is allowed.
1160
1161
1162 USAGE
1163 The index, length, match, and substr functions should not be confused
1164 with similar functions in the ISO C standard; the awk versions deal
1165 with characters, while the ISO C standard deals with bytes.
1166
1167
1168 Because the concatenation operation is represented by adjacent
1169 expressions rather than an explicit operator, it is often necessary to
1170 use parentheses to enforce the proper evaluation precedence.
1171
1172
1173 See largefile(5) for the description of the behavior of awk when
1174 encountering files greater than or equal to 2 Gbyte (2^31 bytes).
1175
1176
1177 EXAMPLES
1178 The awk program specified in the command line is most easily specified
1179 within single-quotes (for example, 'program') for applications using
1180 sh, because awk programs commonly contain characters that are special
1181 to the shell, including double-quotes. In the cases where a awk program
1182 contains single-quote characters, it is usually easiest to specify most
1183 of the program as strings within single-quotes concatenated by the
1184 shell with quoted single-quote characters. For example:
1185
1186 awk '/'\''/ { print "quote:", $0 }'
1187
1188
1189
1190 prints all lines from the standard input containing a single-quote
1191 character, prefixed with quote:.
1192
1193
1194 The following are examples of simple awk programs:
1195
1196 Example 1 Write to the standard output all input lines for which field
1197 3 is greater than 5:
1198
1199 $3 > 5
1200
1201
1202
1203 Example 2 Write every tenth line:
1204
1205 (NR % 10) == 0
1206
1207
1208
1209 Example 3 Write any line with a substring matching the regular
1210 expression:
1211
1212 /(G|D)(2[0-9][[:alpha:]]*)/
1213
1214
1215
1216 Example 4 Print any line with a substring containing a G or D, followed
1217 by a sequence of digits and characters:
1218
1219
1220 This example uses character classes digit and alpha to match language-
1221 independent digit and alphabetic characters, respectively.
1222
1223
1224 /(G|D)([[:digit:][:alpha:]]*)/
1225
1226
1227
1228 Example 5 Write any line in which the second field matches the regular
1229 expression and the fourth field does not:
1230
1231 $2 ~ /xyz/ && $4 !~ /xyz/
1232
1233
1234
1235 Example 6 Write any line in which the second field contains a
1236 backslash:
1237
1238 $2 ~ /\\/
1239
1240
1241
1242 Example 7 Write any line in which the second field contains a backslash
1243 (alternate method):
1244
1245
1246 Notice that backslash escapes are interpreted twice, once in lexical
1247 processing of the string and once in processing the regular expression.
1248
1249
1250 $2 ~ "\\\\"
1251
1252
1253
1254 Example 8 Write the second to the last and the last field in each line,
1255 separating the fields by a colon:
1256
1257 {OFS=":";print $(NF-1), $NF}
1258
1259
1260
1261 Example 9 Write the line number and number of fields in each line:
1262
1263
1264 The three strings representing the line number, the colon and the
1265 number of fields are concatenated and that string is written to
1266 standard output.
1267
1268
1269 {print NR ":" NF}
1270
1271
1272
1273 Example 10 Write lines longer than 72 characters:
1274
1275 {length($0) > 72}
1276
1277
1278
1279 Example 11 Write first two fields in opposite order separated by the
1280 OFS:
1281
1282 { print $2, $1 }
1283
1284
1285
1286 Example 12 Same, with input fields separated by comma or space and tab
1287 characters, or both:
1288
1289 BEGIN { FS = ",[\t]*|[\t]+" }
1290 { print $2, $1 }
1291
1292
1293
1294 Example 13 Add up first column, print sum and average:
1295
1296 {s += $1 }
1297 END {print "sum is ", s, " average is", s/NR}
1298
1299
1300
1301 Example 14 Write fields in reverse order, one per line (many lines out
1302 for each line in):
1303
1304 { for (i = NF; i > 0; --i) print $i }
1305
1306
1307
1308 Example 15 Write all lines between occurrences of the strings "start"
1309 and "stop":
1310
1311 /start/, /stop/
1312
1313
1314
1315 Example 16 Write all lines whose first field is different from the
1316 previous one:
1317
1318 $1 != prev { print; prev = $1 }
1319
1320
1321
1322 Example 17 Simulate the echo command:
1323
1324 BEGIN {
1325 for (i = 1; i < ARGC; ++i)
1326 printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
1327 }
1328
1329
1330
1331 Example 18 Write the path prefixes contained in the PATH environment
1332 variable, one per line:
1333
1334 BEGIN {
1335 n = split (ENVIRON["PATH"], path, ":")
1336 for (i = 1; i <= n; ++i)
1337 print path[i]
1338 }
1339
1340
1341
1342 Example 19 Print the file "input", filling in page numbers starting at
1343 5:
1344
1345
1346 If there is a file named input containing page headers of the form
1347
1348
1349 Page#
1350
1351
1352
1353 and a file named program that contains
1354
1355
1356 /Page/{ $2 = n++; }
1357 { print }
1358
1359
1360
1361 then the command line
1362
1363
1364 awk -f program n=5 input
1365
1366
1367
1368
1369 prints the file input, filling in page numbers starting at 5.
1370
1371
1372 ENVIRONMENT VARIABLES
1373 See environ(5) for descriptions of the following environment variables
1374 that affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.
1375
1376 LC_NUMERIC
1377 Determine the radix character used when interpreting
1378 numeric input, performing conversions between numeric and
1379 string values and formatting numeric output. Regardless
1380 of locale, the period character (the decimal-point
1381 character of the POSIX locale) is the decimal-point
1382 character recognized in processing awk programs
1383 (including assignments in command-line arguments).
1384
1385
1386 EXIT STATUS
1387 The following exit values are returned:
1388
1389 0
1390 All input files were processed successfully.
1391
1392
1393 >0
1394 An error occurred.
1395
1396
1397
1398 The exit status can be altered within the program by using an exit
1399 expression.
1400
1401
1402 SEE ALSO
1403 ed(1), egrep(1), grep(1), lex(1), oawk(1), sed(1), popen(3C),
1404 printf(3C), system(3C), attributes(5), environ(5), largefile(5),
1405 regex(5), XPG4(5)
1406
1407
1408 Aho, A. V., B. W. Kernighan, and P. J. Weinberger, The AWK Programming
1409 Language, Addison-Wesley, 1988.
1410
1411
1412 DIAGNOSTICS
1413 If any file operand is specified and the named file cannot be accessed,
1414 awk writes a diagnostic message to standard error and terminate without
1415 any further action.
1416
1417
1418 If the program specified by either the program operand or a progfile
1419 operand is not a valid awk program (as specified in EXTENDED
1420 DESCRIPTION), the behavior is undefined.
1421
1422
1423 NOTES
1424 Input white space is not preserved on output if fields are involved.
1425
1426
1427 There are no explicit conversions between numbers and strings. To force
1428 an expression to be treated as a number add 0 to it; to force it to be
1429 treated as a string concatenate the null string ("") to it.
1430
1431
1432
1433 April 20, 2020 AWK(1)