1 .\"
2 .\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
3 .\" permission to reproduce portions of its copyrighted documentation.
4 .\" Original documentation from The Open Group can be obtained online at
5 .\" http://www.opengroup.org/bookstore/.
6 .\"
7 .\" The Institute of Electrical and Electronics Engineers and The Open
8 .\" Group, have given us permission to reprint portions of their
9 .\" documentation.
10 .\"
11 .\" In the following statement, the phrase ``this text'' refers to portions
12 .\" of the system documentation.
13 .\"
14 .\" Portions of this text are reprinted and reproduced in electronic form
15 .\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
16 .\" Standard for Information Technology -- Portable Operating System
17 .\" Interface (POSIX), The Open Group Base Specifications Issue 6,
18 .\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
19 .\" Engineers, Inc and The Open Group. In the event of any discrepancy
20 .\" between these versions and the original IEEE and The Open Group
21 .\" Standard, the original IEEE and The Open Group Standard is the referee
22 .\" document. The original Standard can be obtained online at
23 .\" http://www.opengroup.org/unix/online.html.
24 .\"
25 .\" This notice shall appear on any product containing this material.
26 .\"
27 .\" The contents of this file are subject to the terms of the
28 .\" Common Development and Distribution License (the "License").
29 .\" You may not use this file except in compliance with the License.
30 .\"
31 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
32 .\" or http://www.opensolaris.org/os/licensing.
33 .\" See the License for the specific language governing permissions
34 .\" and limitations under the License.
35 .\"
36 .\" When distributing Covered Code, include this CDDL HEADER in each
37 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
38 .\" If applicable, add the following below this CDDL HEADER, with the
39 .\" fields enclosed by brackets "[]" replaced with your own identifying
40 .\" information: Portions Copyright [yyyy] [name of copyright owner]
41 .\"
42 .\"
43 .\" Copyright 1989 AT&T
44 .\" Copyright 1992, X/Open Company Limited All Rights Reserved
45 .\" Portions Copyright (c) 2005, 2006 Sun Microsystems, Inc. All Rights Reserved
46 .\" Copyright 2020 Joyent, Inc.
47 .\"
48 .TH AWK 1 "Apr 20, 2020"
49 .SH NAME
50 awk \- pattern scanning and processing language
51 .SH SYNOPSIS
52 .nf
53 \fB/usr/bin/awk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR] \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
54 [\fIargument\fR]...
55 .fi
56
57 .LP
58 .nf
59 \fB/usr/bin/nawk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR] \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
60 [\fIargument\fR]...
61 .fi
62
63 .LP
64 .nf
65 \fB/usr/xpg4/bin/awk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR]... \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
66 [\fIargument\fR]...
67 .fi
68
69 .SH DESCRIPTION
70 NOTE: The \fBnawk\fR command is now the system default awk for illumos.
71 .LP
72 The \fB/usr/bin/awk\fR and \fB/usr/xpg4/bin/awk\fR utilities execute
73 \fIprogram\fRs written in the \fBawk\fR programming language, which is
74 specialized for textual data manipulation. A \fBawk\fR \fIprogram\fR is a
75 sequence of patterns and corresponding actions. The string specifying
76 \fIprogram\fR must be enclosed in single quotes (') to protect it from
77 interpretation by the shell. The sequence of pattern - action statements can be
78 specified in the command line as \fIprogram\fR or in one, or more, file(s)
79 specified by the \fB-f\fR\fIprogfile\fR option. When input is read that matches
80 a pattern, the action associated with the pattern is performed.
81 .sp
82 .LP
83 Input is interpreted as a sequence of records. By default, a record is a line,
84 but this can be changed by using the \fBRS\fR built-in variable. Each record of
85 input is matched to each pattern in the \fIprogram\fR. For each pattern
86 matched, the associated action is executed.
87 .sp
88 .LP
89 The \fBawk\fR utility interprets each input record as a sequence of fields
90 where, by default, a field is a string of non-blank characters. This default
91 white-space field delimiter (blanks and/or tabs) can be changed by using the
92 \fBFS\fR built-in variable or the \fB-F\fR\fIERE\fR option. The \fBawk\fR
93 utility denotes the first field in a record \fB$1\fR, the second \fB$2\fR, and
94 so forth. The symbol \fB$0\fR refers to the entire record; setting any other
95 field causes the reevaluation of \fB$0\fR. Assigning to \fB$0\fR resets the
96 values of all fields and the \fBNF\fR built-in variable.
97
98 .SH OPTIONS
99 The following options are supported:
100 .sp
101 .ne 2
102 .na
103 \fB\fB-F\fR \fIERE\fR\fR
104 .ad
105 .RS 17n
106 Define the input field separator to be the extended regular expression
107 \fIERE\fR, before any input is read (can be a character).
108 .RE
109
110 .sp
111 .ne 2
112 .na
113 \fB\fB-f\fR \fIprogfile\fR\fR
114 .ad
115 .RS 17n
116 Specifies the pathname of the file \fIprogfile\fR containing a \fBawk\fR
117 program. If multiple instances of this option are specified, the concatenation
118 of the files specified as \fIprogfile\fR in the order specified is the
119 \fBawk\fR program. The \fBawk\fR program can alternatively be specified in
120 the command line as a single argument.
121 .RE
122
123 .sp
124 .ne 2
125 .na
126 \fB\fB-v\fR \fIassignment\fR\fR
127 .ad
128 .RS 17n
129 The \fIassignment\fR argument must be in the same form as an \fIassignment\fR
130 operand. The assignment is of the form \fIvar=value\fR, where \fIvar\fR is the
131 name of one of the variables described below. The specified assignment occurs
132 before executing the \fBawk\fR program, including the actions associated with
133 \fBBEGIN\fR patterns (if any). Multiple occurrences of this option can be
134 specified.
135 .RE
136
137 .sp
138 .ne 2
139 .na
140 \fB\fB-safe\fR\fR
141 .ad
142 .RS 17n
143 When passed to \fBawk\fR, this flag will prevent the program from opening new
144 files or running child processes. The \fBENVIRON\fR array will also not be
145 initialized.
146 .RE
147
148 .SH OPERANDS
149 The following operands are supported:
150 .sp
151 .ne 2
152 .na
153 \fB\fIprogram\fR\fR
154 .ad
155 .RS 12n
156 If no \fB-f\fR option is specified, the first operand to \fBawk\fR is the text
157 of the \fBawk\fR program. The application supplies the \fIprogram\fR operand
158 as a single argument to \fBawk.\fR If the text does not end in a newline
159 character, \fBawk\fR interprets the text as if it did.
160 .RE
161
162 .sp
163 .ne 2
164 .na
165 \fB\fIargument\fR\fR
166 .ad
167 .RS 12n
168 Either of the following two types of \fIargument\fR can be intermixed:
169 .sp
170 .ne 2
171 .na
172 \fB\fIfile\fR\fR
173 .ad
174 .RS 14n
175 A pathname of a file that contains the input to be read, which is matched
176 against the set of patterns in the program. If no \fIfile\fR operands are
177 specified, or if a \fIfile\fR operand is \fB\(mi\fR, the standard input is
178 used.
179 .RE
180
181 .sp
182 .ne 2
183 .na
184 \fB\fIassignment\fR\fR
185 .ad
186 .RS 14n
187 An operand that begins with an underscore or alphabetic character from the
188 portable character set, followed by a sequence of underscores, digits and
189 alphabetics from the portable character set, followed by the \fB=\fR character
190 specifies a variable assignment rather than a pathname. The characters before
191 the \fB=\fR represent the name of a \fBawk\fR variable. If that name is a
192 \fBawk\fR reserved word, the behavior is undefined. The characters following
193 the equal sign is interpreted as if they appeared in the \fBawk\fR program
194 preceded and followed by a double-quote (\fB"\fR) character, as a \fBSTRING\fR
195 token , except that if the last character is an unescaped backslash, it is
196 interpreted as a literal backslash rather than as the first character of the
197 sequence \fB\e\fR\&.. The variable is assigned the value of that \fBSTRING\fR
198 token. If the value is considered a \fInumeric\fRstring\fI,\fR the variable is
199 assigned its numeric value. Each such variable assignment is performed just
200 before the processing of the following \fIfile\fR, if any. Thus, an assignment
201 before the first \fBfile\fR argument is executed after the \fBBEGIN\fR actions
202 (if any), while an assignment after the last \fIfile\fR argument is executed
203 before the \fBEND\fR actions (if any). If there are no \fIfile\fR arguments,
204 assignments are executed before processing the standard input.
205 .RE
206
207 .RE
208
209 .SH INPUT FILES
210 Input files to the \fBawk\fR program from any of the following sources:
211 .RS +4
212 .TP
213 .ie t \(bu
214 .el o
215 any \fIfile\fR operands or their equivalents, achieved by modifying the
216 \fBawk\fR variables \fBARGV\fR and \fBARGC\fR
217 .RE
218 .RS +4
219 .TP
220 .ie t \(bu
221 .el o
222 standard input in the absence of any \fIfile\fR operands
223 .RE
224 .RS +4
225 .TP
226 .ie t \(bu
227 .el o
228 arguments to the \fBgetline\fR function
229 .RE
230 .sp
231 .LP
232 must be text files. Whether the variable \fBRS\fR is set to a value other than
233 a newline character or not, for these files, implementations support records
234 terminated with the specified separator up to \fB{LINE_MAX}\fR bytes and can
235 support longer records.
236 .sp
237 .LP
238 If \fB-\fR\fBf\fR \fIprogfile\fR is specified, the files named by each of the
239 \fIprogfile\fR option-arguments must be text files containing an \fBawk\fR
240 program.
241 .sp
242 .LP
243 The standard input are used only if no \fIfile\fR operands are specified, or if
244 a \fIfile\fR operand is \fB\(mi\fR\&.
245
246 .SH EXTENDED DESCRIPTION
247 A \fBawk\fR program is composed of pairs of the form:
248 .sp
249 .in +2
250 .nf
251 pattern { \fIaction\fR }
252 .fi
253 .in -2
254
255 .sp
256 .LP
257 Either the pattern or the action (including the enclosing brace characters) can
258 be omitted. Pattern-action statements are separated by a semicolon or by a
259 newline.
260 .sp
261 .LP
262 A missing pattern matches any record of input, and a missing action is
263 equivalent to an action that writes the matched record of input to standard
264 output.
265 .sp
266 .LP
267 Execution of the \fBawk\fR program starts by first executing the actions
268 associated with all \fBBEGIN\fR patterns in the order they occur in the
269 program. Then each \fIfile\fR operand (or standard input if no files were
270 specified) is processed by reading data from the file until a record separator
271 is seen (a newline character by default), splitting the current record into
272 fields using the current value of \fBFS\fR, evaluating each pattern in the
273 program in the order of occurrence, and executing the action associated with
274 each pattern that matches the current record. The action for a matching pattern
275 is executed before evaluating subsequent patterns. Last, the actions associated
276 with all \fBEND\fR patterns is executed in the order they occur in the program.
277
278 .SS "Expressions in awk"
279 Expressions describe computations used in \fIpatterns\fR and \fIactions\fR. In
280 the following table, valid expression operations are given in groups from
281 highest precedence first to lowest precedence last, with equal-precedence
282 operators grouped between horizontal lines. In expression evaluation, where the
283 grammar is formally ambiguous, higher precedence operators are evaluated before
284 lower precedence operators. In this table \fIexpr,\fR \fIexpr1,\fR
285 \fIexpr2,\fR and \fIexpr3\fR represent any expression, while \fIlvalue\fR
286 represents any entity that can be assigned to (that is, on the left side of an
287 assignment operator).
288 .sp
289
290 .sp
291 .TS
292 c c c c
293 l l l l .
294 \fBSyntax\fR \fBName\fR \fBType of Result\fR \fBAssociativity\fR
295 _
296 ( \fIexpr\fR ) Grouping type of \fIexpr\fR n/a
297 _
298 $\fIexpr\fR Field reference string n/a
299 _
300 ++ \fIlvalue\fR Pre-increment numeric n/a
301 \(mi\(mi \fIlvalue\fR Pre-decrement numeric n/a
302 \fIlvalue\fR ++ Post-increment numeric n/a
303 \fIlvalue\fR \(mi\(mi Post-decrement numeric n/a
304 _
305 \fIexpr\fR ^ \fIexpr\fR Exponentiation numeric right
306 _
307 ! \fIexpr\fR Logical not numeric n/a
308 + \fIexpr\fR Unary plus numeric n/a
309 \(mi \fIexpr\fR Unary minus numeric n/a
310 _
311 \fIexpr\fR * \fIexpr\fR Multiplication numeric left
312 \fIexpr\fR / \fIexpr\fR Division numeric left
313 \fIexpr\fR % \fIexpr\fR Modulus numeric left
314 _
315 \fIexpr\fR + \fIexpr\fR Addition numeric left
316 \fIexpr\fR \(mi \fIexpr\fR Subtraction numeric left
317 _
318 \fIexpr\fR \fIexpr\fR String concatenation string left
319 _
320 \fIexpr\fR < \fIexpr\fR Less than numeric none
321 \fIexpr\fR <= \fIexpr\fR Less than or equal to numeric none
322 \fIexpr\fR != \fIexpr\fR Not equal to numeric none
323 \fIexpr\fR == \fIexpr\fR Equal to numeric none
324 \fIexpr\fR > \fIexpr\fR Greater than numeric none
325 \fIexpr\fR >= \fIexpr\fR Greater than or equal to numeric none
326 _
327 \fIexpr\fR ~ \fIexpr\fR ERE match numeric none
328 \fIexpr\fR !~ \fIexpr\fR ERE non-match numeric none
329 _
330 \fIexpr\fR in array Array membership numeric left
331 ( \fIindex\fR ) in Multi-dimension array numeric left
332 \fIarray\fR membership
333 _
334 \fBexpr\fR && \fIexpr\fR Logical AND numeric left
335 _
336 \fBexpr\fR |\|| \fIexpr\fR Logical OR numeric left
337 _
338 \fIexpr1\fR ? \fIexpr2\fR Conditional expression type of selected right
339 : \fIexpr3\fR \fIexpr2\fR or \fIexpr3\fR
340 _
341 \fIlvalue\fR ^= \fIexpr\fR Exponentiation numeric right
342 assignment
343 \fIlvalue\fR %= \fIexpr\fR Modulus assignment numeric right
344 \fIlvalue\fR *= \fIexpr\fR Multiplication numeric right
345 assignment
346 \fIlvalue\fR /= \fIexpr\fR Division assignment numeric right
347 \fIlvalue\fR += \fIexpr\fR Addition assignment numeric right
348 \fIlvalue\fR \(mi= \fIexpr\fR Subtraction assignment numeric right
349 \fIlvalue\fR = \fIexpr\fR Assignment type of \fIexpr\fR right
350 .TE
351
352 .sp
353 .LP
354 Each expression has either a string value, a numeric value or both. Except as
355 stated for specific contexts, the value of an expression is implicitly
356 converted to the type needed for the context in which it is used. A string
357 value is converted to a numeric value by the equivalent of the following calls:
358 .sp
359 .in +2
360 .nf
361 setlocale(LC_NUMERIC, "");
362 \fInumeric_value\fR = atof(\fIstring_value\fR);
363 .fi
364 .in -2
365
366 .sp
367 .LP
368 A numeric value that is exactly equal to the value of an integer is converted
369 to a string by the equivalent of a call to the \fBsprintf\fR function with the
370 string \fB%d\fR as the \fBfmt\fR argument and the numeric value being converted
371 as the first and only \fIexpr\fR argument. Any other numeric value is
372 converted to a string by the equivalent of a call to the \fBsprintf\fR function
373 with the value of the variable \fBCONVFMT\fR as the \fBfmt\fR argument and the
374 numeric value being converted as the first and only \fIexpr\fR argument.
375 .sp
376 .LP
377 A string value is considered to be a \fInumeric string\fR in the following
378 case:
379 .RS +4
380 .TP
381 1.
382 Any leading and trailing blank characters is ignored.
383 .RE
384 .RS +4
385 .TP
386 2.
387 If the first unignored character is a \fB+\fR or \fB\(mi\fR, it is ignored.
388 .RE
389 .RS +4
390 .TP
391 3.
392 If the remaining unignored characters would be lexically recognized as a
393 \fBNUMBER\fR token, the string is considered a \fInumeric string\fR.
394 .RE
395 .sp
396 .LP
397 If a \fB\(mi\fR character is ignored in the above steps, the numeric value of
398 the \fInumeric string\fR is the negation of the numeric value of the recognized
399 \fBNUMBER\fR token. Otherwise the numeric value of the \fInumeric string\fR is
400 the numeric value of the recognized \fBNUMBER\fR token. Whether or not a string
401 is a \fInumeric string\fR is relevant only in contexts where that term is used
402 in this section.
403 .sp
404 .LP
405 When an expression is used in a Boolean context, if it has a numeric value, a
406 value of zero is treated as false and any other value is treated as true.
407 Otherwise, a string value of the null string is treated as false and any other
408 value is treated as true. A Boolean context is one of the following:
409 .RS +4
410 .TP
411 .ie t \(bu
412 .el o
413 the first subexpression of a conditional expression.
414 .RE
415 .RS +4
416 .TP
417 .ie t \(bu
418 .el o
419 an expression operated on by logical NOT, logical \fBAND,\fR or logical OR.
420 .RE
421 .RS +4
422 .TP
423 .ie t \(bu
424 .el o
425 the second expression of a \fBfor\fR statement.
426 .RE
427 .RS +4
428 .TP
429 .ie t \(bu
430 .el o
431 the expression of an \fBif\fR statement.
432 .RE
433 .RS +4
434 .TP
435 .ie t \(bu
436 .el o
437 the expression of the \fBwhile\fR clause in either a \fBwhile\fR or \fBdo\fR
438 \fB\&.\|.\|.\fR \fBwhile\fR statement.
439 .RE
440 .RS +4
441 .TP
442 .ie t \(bu
443 .el o
444 an expression used as a pattern (as in Overall Program Structure).
445 .RE
446 .sp
447 .LP
448 The \fBawk\fR language supplies arrays that are used for storing numbers or
449 strings. Arrays need not be declared. They are initially empty, and their sizes
450 changes dynamically. The subscripts, or element identifiers, are strings,
451 providing a type of associative array capability. An array name followed by a
452 subscript within square brackets can be used as an \fIlvalue\fR and as an
453 expression, as described in the grammar. Unsubscripted array names are used in
454 only the following contexts:
455 .RS +4
456 .TP
457 .ie t \(bu
458 .el o
459 a parameter in a function definition or function call.
460 .RE
461 .RS +4
462 .TP
463 .ie t \(bu
464 .el o
465 the \fBNAME\fR token following any use of the keyword \fBin\fR.
466 .RE
467 .sp
468 .LP
469 A valid array \fIindex\fR consists of one or more comma-separated expressions,
470 similar to the way in which multi-dimensional arrays are indexed in some
471 programming languages. Because \fBawk\fR arrays are really one-dimensional,
472 such a comma-separated list is converted to a single string by concatenating
473 the string values of the separate expressions, each separated from the other by
474 the value of the \fBSUBSEP\fR variable.
475 .sp
476 .LP
477 Thus, the following two index operations are equivalent:
478 .sp
479 .in +2
480 .nf
481 var[expr1, expr2, ... exprn]
482 var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
483 .fi
484 .in -2
485
486 .sp
487 .LP
488 A multi-dimensioned \fIindex\fR used with the \fBin\fR operator must be put in
489 parentheses. The \fBin\fR operator, which tests for the existence of a
490 particular array element, does not create the element if it does not exist.
491 Any other reference to a non-existent array element automatically creates it.
492
493 .SS "Variables and Special Variables"
494 Variables can be used in an \fBawk\fR program by referencing them. With the
495 exception of function parameters, they are not explicitly declared.
496 Uninitialized scalar variables and array elements have both a numeric value of
497 zero and a string value of the empty string.
498 .sp
499 .LP
500 Field variables are designated by a \fB$\fR followed by a number or numerical
501 expression. The effect of the field number \fIexpression\fR evaluating to
502 anything other than a non-negative integer is unspecified. Uninitialized
503 variables or string values need not be converted to numeric values in this
504 context. New field variables are created by assigning a value to them.
505 References to non-existent fields (that is, fields after \fB$NF\fR) produce the
506 null string. However, assigning to a non-existent field (for example,
507 \fB$(NF+2) = 5\fR) increases the value of \fBNF\fR, create any intervening
508 fields with the null string as their values and cause the value of \fB$0\fR to
509 be recomputed, with the fields being separated by the value of \fBOFS\fR. Each
510 field variable has a string value when created. If the string, with any
511 occurrence of the decimal-point character from the current locale changed to a
512 period character, is considered a \fInumeric string\fR (see \fBExpressions in
513 awk\fR above), the field variable also has the numeric value of the \fInumeric
514 string\fR.
515
516 .SS "/usr/bin/awk, /usr/xpg4/bin/awk"
517 \fBawk\fR sets the following special variables that are supported by both
518 \fB/usr/bin/awk\fR and \fB/usr/xpg4/bin/awk\fR:
519 .sp
520 .ne 2
521 .na
522 \fB\fBARGC\fR\fR
523 .ad
524 .RS 12n
525 The number of elements in the \fBARGV\fR array.
526 .RE
527
528 .sp
529 .ne 2
530 .na
531 \fB\fBARGV\fR\fR
532 .ad
533 .RS 12n
534 An array of command line arguments, excluding options and the \fIprogram\fR
535 argument, numbered from zero to \fBARGC\fR\(mi1.
536 .sp
537 The arguments in \fBARGV\fR can be modified or added to; \fBARGC\fR can be
538 altered. As each input file ends, \fBawk\fR treats the next non-null element
539 of \fBARGV\fR, up to the current value of \fBARGC\fR\(mi1, inclusive, as the
540 name of the next input file. Setting an element of \fBARGV\fR to null means
541 that it is not treated as an input file. The name \fB\(mi\fR indicates the
542 standard input. If an argument matches the format of an \fIassignment\fR
543 operand, this argument is treated as an assignment rather than a \fIfile\fR
544 argument.
545 .RE
546
547 .sp
548 .ne 2
549 .na
550 \fB\fBCONVFMT\fR\fR
551 .ad
552 .RS 12n
553 The \fBprintf\fR format for converting numbers to strings (except for output
554 statements, where \fBOFMT\fR is used). The default is \fB%.6g\fR.
555 .RE
556
557 .sp
558 .ne 2
559 .na
560 \fB\fBENVIRON\fR\fR
561 .ad
562 .RS 12n
563 The variable \fBENVIRON\fR is an array representing the value of the
564 environment. The indices of the array are strings consisting of the names of
565 the environment variables, and the value of each array element is a string
566 consisting of the value of that variable. If the value of an environment
567 variable is considered a \fInumeric string\fR, the array element also has its
568 numeric value.
569 .sp
570 In all cases where \fBawk\fR behavior is affected by environment variables
571 (including the environment of any commands that \fBawk\fR executes via the
572 \fBsystem\fR function or via pipeline redirections with the \fBprint\fR
573 statement, the \fBprintf\fR statement, or the \fBgetline\fR function), the
574 environment used is the environment at the time \fBawk\fR began executing.
575 .RE
576
577 .sp
578 .ne 2
579 .na
580 \fB\fBFILENAME\fR\fR
581 .ad
582 .RS 12n
583 A pathname of the current input file. Inside a \fBBEGIN\fR action the value is
584 undefined. Inside an \fBEND\fR action the value is the name of the last input
585 file processed.
586 .RE
587
588 .sp
589 .ne 2
590 .na
591 \fB\fBFNR\fR\fR
592 .ad
593 .RS 12n
594 The ordinal number of the current record in the current file. Inside a
595 \fBBEGIN\fR action the value is zero. Inside an \fBEND\fR action the value is
596 the number of the last record processed in the last file processed.
597 .RE
598
599 .sp
600 .ne 2
601 .na
602 \fB\fBFS\fR\fR
603 .ad
604 .RS 12n
605 Input field separator regular expression; a space character by default.
606 .RE
607
608 .sp
609 .ne 2
610 .na
611 \fB\fBNF\fR\fR
612 .ad
613 .RS 12n
614 The number of fields in the current record. Inside a \fBBEGIN\fR action, the
615 use of \fBNF\fR is undefined unless a \fBgetline\fR function without a
616 \fIvar\fR argument is executed previously. Inside an \fBEND\fR action, \fBNF\fR
617 retains the value it had for the last record read, unless a subsequent,
618 redirected, \fBgetline\fR function without a \fIvar\fR argument is performed
619 prior to entering the \fBEND\fR action.
620 .RE
621
622 .sp
623 .ne 2
624 .na
625 \fB\fBNR\fR\fR
626 .ad
627 .RS 12n
628 The ordinal number of the current record from the start of input. Inside a
629 \fBBEGIN\fR action the value is zero. Inside an \fBEND\fR action the value is
630 the number of the last record processed.
631 .RE
632
633 .sp
634 .ne 2
635 .na
636 \fB\fBOFMT\fR\fR
637 .ad
638 .RS 12n
639 The \fBprintf\fR format for converting numbers to strings in output statements
640 \fB"%.6g"\fR by default. The result of the conversion is unspecified if the
641 value of \fBOFMT\fR is not a floating-point format specification.
642 .RE
643
644 .sp
645 .ne 2
646 .na
647 \fB\fBOFS\fR\fR
648 .ad
649 .RS 12n
650 The \fBprint\fR statement output field separator; a space character by default.
651 .RE
652
653 .sp
654 .ne 2
655 .na
656 \fB\fBORS\fR\fR
657 .ad
658 .RS 12n
659 The \fBprint\fR output record separator; a newline character by default.
660 .RE
661
662 .sp
663 .ne 2
664 .na
665 \fB\fBRLENGTH\fR\fR
666 .ad
667 .RS 12n
668 The length of the string matched by the \fBmatch\fR function.
669 .RE
670
671 .sp
672 .ne 2
673 .na
674 \fB\fBRS\fR\fR
675 .ad
676 .RS 12n
677 The first character of the string value of \fBRS\fR is the input record
678 separator; a newline character by default. If \fBRS\fR contains more than one
679 character, the results are unspecified. If \fBRS\fR is null, then records are
680 separated by sequences of one or more blank lines. Leading or trailing blank
681 lines do not produce empty records at the beginning or end of input, and the
682 field separator is always newline, no matter what the value of \fBFS\fR.
683 .RE
684
685 .sp
686 .ne 2
687 .na
688 \fB\fBRSTART\fR\fR
689 .ad
690 .RS 12n
691 The starting position of the string matched by the \fBmatch\fR function,
692 numbering from 1. This is always equivalent to the return value of the
693 \fBmatch\fR function.
694 .RE
695
696 .sp
697 .ne 2
698 .na
699 \fB\fBSUBSEP\fR\fR
700 .ad
701 .RS 12n
702 The subscript separator string for multi-dimensional arrays. The default value
703 is \fB\e034\fR\&.
704 .RE
705
706 .SS "/usr/bin/awk"
707 The following variable is supported for \fB/usr/bin/awk\fR only:
708 .sp
709 .ne 2
710 .na
711 \fB\fBRT\fR\fR
712 .ad
713 .RS 12n
714 The record terminator for the most recent record read. For most records this
715 will be the same value as \fBRS\fR. At the end of a file with no trailing
716 separator value, though, this will be set to the empty string (\fB""\fR).
717 .RE
718
719 .SS "Regular Expressions"
720 The \fBawk\fR utility makes use of the extended regular expression notation
721 (see \fBregex\fR(5)) except that it allows the use of C-language conventions to
722 escape special characters within the EREs, namely \fB\e\e\fR, \fB\ea\fR,
723 \fB\eb\fR, \fB\ef\fR, \fB\en\fR, \fB\er\fR, \fB\et\fR, \fB\ev\fR, and those
724 specified in the following table. These escape sequences are recognized both
725 inside and outside bracket expressions. Note that records need not be
726 separated by newline characters and string constants can contain newline
727 characters, so even the \fB\en\fR sequence is valid in \fBawk\fR EREs. Using
728 a slash character within the regular expression requires escaping as shown in
729 the table below:
730 .sp
731
732 .sp
733 .TS
734 l l l
735 l l l .
736 \fBEscape Sequence\fR \fBDescription\fR \fBMeaning\fR
737 _
738 \fB\e"\fR Backslash quotation-mark Quotation-mark character
739 _
740 \fB\e/\fR Backslash slash Slash character
741 _
742 \fB\e\fR\fIddd\fR T{
743 A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If all of the digits are 0, (that is, representation of the NULL character), the behavior is undefined.
744 T} T{
745 The character encoded by the one-, two- or three-digit octal integer. Multi-byte characters require multiple, concatenated escape sequences, including the leading \e for each byte.
746 T}
747 _
748 \fB\e\fR\fIc\fR T{
749 A backslash character followed by any character not described in this table or special characters (\fB\e\e\fR, \fB\ea\fR, \fB\eb\fR, \fB\ef\fR, \fB\en\fR, \fB\er\fR, \fB\et\fR, \fB\ev\fR).
750 T} Undefined
751 .TE
752
753 .sp
754 .LP
755 A regular expression can be matched against a specific field or string by using
756 one of the two regular expression matching operators, \fB~\fR and \fB!\|~\fR.
757 These operators interpret their right-hand operand as a regular expression and
758 their left-hand operand as a string. If the regular expression matches the
759 string, the \fB~\fR expression evaluates to the value \fB1\fR, and the
760 \fB!\|~\fR expression evaluates to the value \fB0\fR. If the regular expression
761 does not match the string, the \fB~\fR expression evaluates to the value
762 \fB0\fR, and the \fB!\|~\fR expression evaluates to the value \fB1\fR. If the
763 right-hand operand is any expression other than the lexical token \fBERE\fR,
764 the string value of the expression is interpreted as an extended regular
765 expression, including the escape conventions described above. Notice that these
766 same escape conventions also are applied in the determining the value of a
767 string literal (the lexical token \fBSTRING\fR), and is applied a second time
768 when a string literal is used in this context.
769 .sp
770 .LP
771 When an \fBERE\fR token appears as an expression in any context other than as
772 the right-hand of the \fB~\fR or \fB!\|~\fR operator or as one of the built-in
773 function arguments described below, the value of the resulting expression is
774 the equivalent of:
775 .sp
776 .in +2
777 .nf
778 $0 ~ /\fIere\fR/
779 .fi
780 .in -2
781
782 .sp
783 .LP
784 The \fIere\fR argument to the \fBgsub,\fR \fBmatch,\fR \fBsub\fR functions, and
785 the \fIfs\fR argument to the \fBsplit\fR function (see \fBString Functions\fR)
786 is interpreted as extended regular expressions. These can be either \fBERE\fR
787 tokens or arbitrary expressions, and are interpreted in the same manner as the
788 right-hand side of the \fB~\fR or \fB!\|~\fR operator.
789 .sp
790 .LP
791 An extended regular expression can be used to separate fields by using the
792 \fB-F\fR \fIERE\fR option or by assigning a string containing the expression to
793 the built-in variable \fBFS\fR. The default value of the \fBFS\fR variable is a
794 single space character. The following describes \fBFS\fR behavior:
795 .RS +4
796 .TP
797 1.
798 If \fBFS\fR is a single character:
799 .RS +4
800 .TP
801 .ie t \(bu
802 .el o
803 If \fBFS\fR is the space character, skip leading and trailing blank characters;
804 fields are delimited by sets of one or more blank characters.
805 .RE
806 .RS +4
807 .TP
808 .ie t \(bu
809 .el o
810 Otherwise, if \fBFS\fR is any other character \fIc\fR, fields are delimited by
811 each single occurrence of \fIc\fR.
812 .RE
813 .RE
814 .RS +4
815 .TP
816 2.
817 Otherwise, the string value of \fBFS\fR is considered to be an extended
818 regular expression. Each occurrence of a sequence matching the extended regular
819 expression delimits fields.
820 .RE
821 .sp
822 .LP
823 Except in the \fBgsub\fR, \fBmatch\fR, \fBsplit\fR, and \fBsub\fR built-in
824 functions, regular expression matching is based on input records. That is,
825 record separator characters (the first character of the value of the variable
826 \fBRS\fR, a newline character by default) cannot be embedded in the expression,
827 and no expression matches the record separator character. If the record
828 separator is not a newline character, newline characters embedded in the
829 expression can be matched. In those four built-in functions, regular expression
830 matching are based on text strings. So, any character (including the newline
831 character and the record separator) can be embedded in the pattern and an
832 appropriate pattern matches any character. However, in all \fBawk\fR regular
833 expression matching, the use of one or more NULL characters in the pattern,
834 input record or text string produces undefined results.
835
836 .SS "Patterns"
837 A \fIpattern\fR is any valid \fIexpression,\fR a range specified by two
838 expressions separated by comma, or one of the two special patterns \fBBEGIN\fR
839 or \fBEND\fR.
840
841 .SS "Special Patterns"
842 The \fBawk\fR utility recognizes two special patterns, \fBBEGIN\fR and
843 \fBEND\fR. Each \fBBEGIN\fR pattern is matched once and its associated action
844 executed before the first record of input is read (except possibly by use of
845 the \fBgetline\fR function in a prior \fBBEGIN\fR action) and before command
846 line assignment is done. Each \fBEND\fR pattern is matched once and its
847 associated action executed after the last record of input has been read. These
848 two patterns have associated actions.
849 .sp
850 .LP
851 \fBBEGIN\fR and \fBEND\fR do not combine with other patterns. Multiple
852 \fBBEGIN\fR and \fBEND\fR patterns are allowed. The actions associated with the
853 \fBBEGIN\fR patterns are executed in the order specified in the program, as are
854 the \fBEND\fR actions. An \fBEND\fR pattern can precede a \fBBEGIN\fR pattern
855 in a program.
856 .sp
857 .LP
858 If an \fBawk\fR program consists of only actions with the pattern \fBBEGIN\fR,
859 and the \fBBEGIN\fR action contains no \fBgetline\fR function, \fBawk\fR exits
860 without reading its input when the last statement in the last \fBBEGIN\fR
861 action is executed. If an \fBawk\fR program consists of only actions with the
862 pattern \fBEND\fR or only actions with the patterns \fBBEGIN\fR and \fBEND\fR,
863 the input is read before the statements in the \fBEND\fR actions are executed.
864
865 .SS "Expression Patterns"
866 An expression pattern is evaluated as if it were an expression in a Boolean
867 context. If the result is true, the pattern is considered to match, and the
868 associated action (if any) is executed. If the result is false, the action is
869 not executed.
870
871 .SS "Pattern Ranges"
872 A pattern range consists of two expressions separated by a comma. In this case,
873 the action is performed for all records between a match of the first expression
874 and the following match of the second expression, inclusive. At this point, the
875 pattern range can be repeated starting at input records subsequent to the end
876 of the matched range.
877
878 .SS "Actions"
879 An action is a sequence of statements. A statement can be one of the following:
880 .sp
881 .in +2
882 .nf
883 if ( \fIexpression\fR ) \fIstatement\fR [ else \fIstatement\fR ]
884 while ( \fIexpression\fR ) \fIstatement\fR
885 do \fIstatement\fR while ( \fIexpression\fR )
886 for ( \fIexpression\fR ; \fIexpression\fR ; \fIexpression\fR ) \fIstatement\fR
887 for ( \fIvar\fR in \fIarray\fR ) \fIstatement\fR
888 delete \fIarray\fR[\fIsubscript\fR] #delete an array element
889 delete \fIarray\fR #delete all elements within an array
890 break
891 continue
892 { [ \fIstatement\fR ] .\|.\|. }
893 \fIexpression\fR # commonly variable = expression
894 print [ \fIexpression-list\fR ] [ >\fIexpression\fR ]
895 printf format [ ,\fIexpression-list\fR ] [ >\fIexpression\fR ]
896 next # skip remaining patterns on this input line
897 nextfile # skip remaining patterns on this input file
898 exit [expr] # skip the rest of the input; exit status is expr
899 return [expr]
900 .fi
901 .in -2
902
903 .sp
904 .LP
905 Any single statement can be replaced by a statement list enclosed in braces.
906 The statements are terminated by newline characters or semicolons, and are
907 executed sequentially in the order that they appear.
908 .sp
909 .LP
910 The \fBnext\fR statement causes all further processing of the current input
911 record to be abandoned. The behavior is undefined if a \fBnext\fR statement
912 appears or is invoked in a \fBBEGIN\fR or \fBEND\fR action.
913 .sp
914 .LP
915 The \fBnextfile\fR statement is similar to \fBnext\fR, but also skips all other
916 records in the current file, and moves on to processing the next input file if
917 available (or exits the program if there are none). (Note that this keyword is
918 not supported by \fB/usr/xpg4/bin/awk\fR.)
919 .sp
920 .LP
921 The \fBexit\fR statement invokes all \fBEND\fR actions in the order in which
922 they occur in the program source and then terminate the program without reading
923 further input. An \fBexit\fR statement inside an \fBEND\fR action terminates
924 the program without further execution of \fBEND\fR actions. If an expression
925 is specified in an \fBexit\fR statement, its numeric value is the exit status
926 of \fBawk\fR, unless subsequent errors are encountered or a subsequent
927 \fBexit\fR statement with an expression is executed.
928
929 .SS "Output Statements"
930 Both \fBprint\fR and \fBprintf\fR statements write to standard output by
931 default. The output is written to the location specified by
932 \fIoutput_redirection\fR if one is supplied, as follows:
933 .sp
934 .in +2
935 .nf
936 \fB>\fR \fIexpression\fR\fB>>\fR \fIexpression\fR\fB|\fR \fIexpression\fR
937 .fi
938 .in -2
939
940 .sp
941 .LP
942 In all cases, the \fIexpression\fR is evaluated to produce a string that is
943 used as a full pathname to write into (for \fB>\fR or \fB>>\fR) or as a command
944 to be executed (for \fB|\fR). Using the first two forms, if the file of that
945 name is not currently open, it is opened, creating it if necessary and using
946 the first form, truncating the file. The output then is appended to the file.
947 As long as the file remains open, subsequent calls in which \fIexpression\fR
948 evaluates to the same string value simply appends output to the file. The file
949 remains open until the \fBclose\fR function, which is called with an expression
950 that evaluates to the same string value.
951 .sp
952 .LP
953 The third form writes output onto a stream piped to the input of a command. The
954 stream is created if no stream is currently open with the value of
955 \fIexpression\fR as its command name. The stream created is equivalent to one
956 created by a call to the \fBpopen\fR(3C) function with the value of
957 \fIexpression\fR as the \fIcommand\fR argument and a value of \fBw\fR as the
958 \fImode\fR argument. As long as the stream remains open, subsequent calls in
959 which \fIexpression\fR evaluates to the same string value writes output to the
960 existing stream. The stream remains open until the \fBclose\fR function is
961 called with an expression that evaluates to the same string value. At that
962 time, the stream is closed as if by a call to the \fBpclose\fR function.
963 .sp
964 .LP
965 These output statements take a comma-separated list of \fIexpression\fR \fIs\fR
966 referred in the grammar by the non-terminal symbols \fBexpr_list,\fR
967 \fBprint_expr_list\fR or \fBprint_expr_list_opt.\fR This list is referred to
968 here as the \fIexpression list\fR, and each member is referred to as an
969 \fIexpression argument\fR.
970 .sp
971 .LP
972 The \fBprint\fR statement writes the value of each expression argument onto the
973 indicated output stream separated by the current output field separator (see
974 variable \fBOFS\fR above), and terminated by the output record separator (see
975 variable \fBORS\fR above). All expression arguments is taken as strings, being
976 converted if necessary; with the exception that the \fBprintf\fR format in
977 \fBOFMT\fR is used instead of the value in \fBCONVFMT\fR. An empty expression
978 list stands for the whole input record \fB(\fR$0\fB)\fR.
979 .sp
980 .LP
981 The \fBprintf\fR statement produces output based on a notation similar to the
982 File Format Notation used to describe file formats in this document Output is
983 produced as specified with the first expression argument as the string
984 \fBformat\fR and subsequent expression arguments as the strings \fBarg1\fR to
985 \fBargn,\fR inclusive, with the following exceptions:
986 .RS +4
987 .TP
988 1.
989 The \fIformat\fR is an actual character string rather than a graphical
990 representation. Therefore, it cannot contain empty character positions. The
991 space character in the \fIformat\fR string, in any context other than a
992 \fIflag\fR of a conversion specification, is treated as an ordinary character
993 that is copied to the output.
994 .RE
995 .RS +4
996 .TP
997 2.
998 If the character set contains a Delta character and that character appears
999 in the \fIformat\fR string, it is treated as an ordinary character that is
1000 copied to the output.
1001 .RE
1002 .RS +4
1003 .TP
1004 3.
1005 The \fIescape sequences\fR beginning with a backslash character is treated
1006 as sequences of ordinary characters that are copied to the output. Note that
1007 these same sequences is interpreted lexically by \fBawk\fR when they appear in
1008 literal strings, but they is not treated specially by the \fBprintf\fR
1009 statement.
1010 .RE
1011 .RS +4
1012 .TP
1013 4.
1014 A \fIfield width\fR or \fIprecision\fR can be specified as the \fB*\fR
1015 character instead of a digit string. In this case the next argument from the
1016 expression list is fetched and its numeric value taken as the field width or
1017 precision.
1018 .RE
1019 .RS +4
1020 .TP
1021 5.
1022 The implementation does not precede or follow output from the \fBd\fR or
1023 \fBu\fR conversion specifications with blank characters not specified by the
1024 \fIformat\fR string.
1025 .RE
1026 .RS +4
1027 .TP
1028 6.
1029 The implementation does not precede output from the \fBo\fR conversion
1030 specification with leading zeros not specified by the \fIformat\fR string.
1031 .RE
1032 .RS +4
1033 .TP
1034 7.
1035 For the \fBc\fR conversion specification: if the argument has a numeric
1036 value, the character whose encoding is that value is output. If the value is
1037 zero or is not the encoding of any character in the character set, the behavior
1038 is undefined. If the argument does not have a numeric value, the first
1039 character of the string value is output; if the string does not contain any
1040 characters the behavior is undefined.
1041 .RE
1042 .RS +4
1043 .TP
1044 8.
1045 For each conversion specification that consumes an argument, the next
1046 expression argument is evaluated. With the exception of the \fBc\fR conversion,
1047 the value is converted to the appropriate type for the conversion
1048 specification.
1049 .RE
1050 .RS +4
1051 .TP
1052 9.
1053 If there are insufficient expression arguments to satisfy all the conversion
1054 specifications in the \fIformat\fR string, the behavior is undefined.
1055 .RE
1056 .RS +4
1057 .TP
1058 10.
1059 If any character sequence in the \fIformat\fR string begins with a %
1060 character, but does not form a valid conversion specification, the behavior is
1061 unspecified.
1062 .RE
1063 .sp
1064 .LP
1065 Both \fBprint\fR and \fBprintf\fR can output at least \fB{LINE_MAX}\fR bytes.
1066
1067 .SS "Functions"
1068 The \fBawk\fR language has a variety of built-in functions: arithmetic,
1069 string, input/output and general.
1070
1071 .SS "Arithmetic Functions"
1072 The arithmetic functions, except for \fBint\fR, are based on the \fBISO\fR
1073 \fBC\fR standard. The behavior is undefined in cases where the \fBISO\fR
1074 \fBC\fR standard specifies that an error be returned or that the behavior is
1075 undefined. Although the grammar permits built-in functions to appear with no
1076 arguments or parentheses, unless the argument or parentheses are indicated as
1077 optional in the following list (by displaying them within the \fB[ ]\fR
1078 brackets), such use is undefined.
1079 .sp
1080 .ne 2
1081 .na
1082 \fB\fBatan2(\fR\fIy\fR,\fIx\fR\fB)\fR\fR
1083 .ad
1084 .RS 17n
1085 Return arctangent of \fIy\fR/\fIx\fR.
1086 .RE
1087
1088 .sp
1089 .ne 2
1090 .na
1091 \fB\fBcos\fR(\fIx\fR)\fR
1092 .ad
1093 .RS 17n
1094 Return cosine of \fIx,\fR where \fIx\fR is in radians.
1095 .RE
1096
1097 .sp
1098 .ne 2
1099 .na
1100 \fB\fBsin\fR(\fIx\fR)\fR
1101 .ad
1102 .RS 17n
1103 Return sine of \fIx,\fR where \fIx\fR is in radians.
1104 .RE
1105
1106 .sp
1107 .ne 2
1108 .na
1109 \fB\fBexp\fR(\fIx\fR)\fR
1110 .ad
1111 .RS 17n
1112 Return the exponential function of \fIx\fR.
1113 .RE
1114
1115 .sp
1116 .ne 2
1117 .na
1118 \fB\fBlog\fR(\fIx\fR)\fR
1119 .ad
1120 .RS 17n
1121 Return the natural logarithm of \fIx\fR.
1122 .RE
1123
1124 .sp
1125 .ne 2
1126 .na
1127 \fB\fBsqrt\fR(\fIx\fR)\fR
1128 .ad
1129 .RS 17n
1130 Return the square root of \fIx\fR.
1131 .RE
1132
1133 .sp
1134 .ne 2
1135 .na
1136 \fB\fBint\fR(\fIx\fR)\fR
1137 .ad
1138 .RS 17n
1139 Truncate its argument to an integer. It is truncated toward 0 when \fIx\fR > 0.
1140 .RE
1141
1142 .sp
1143 .ne 2
1144 .na
1145 \fB\fBrand()\fR\fR
1146 .ad
1147 .RS 17n
1148 Return a random number \fIn\fR, such that 0 \(<= \fIn\fR < 1.
1149 .RE
1150
1151 .sp
1152 .ne 2
1153 .na
1154 \fB\fBsrand\fR([\fBexpr\fR])\fR
1155 .ad
1156 .RS 17n
1157 Set the seed value for \fBrand\fR to \fIexpr\fR or use the time of day if
1158 \fIexpr\fR is omitted. The previous seed value is returned.
1159 .RE
1160
1161 .SS "String Functions"
1162 The string functions in the following list shall be supported. Although the
1163 grammar permits built-in functions to appear with no arguments or parentheses,
1164 unless the argument or parentheses are indicated as optional in the following
1165 list (by displaying them within the \fB[ ]\fR brackets), such use is undefined.
1166 .sp
1167 .ne 2
1168 .na
1169 \fB\fBgsub\fR(\fIere\fR,\fIrepl\fR[,\|\fIin\fR])\fR
1170 .ad
1171 .sp .6
1172 .RS 4n
1173 Behave like \fBsub\fR (see below), except that it replaces all occurrences of
1174 the regular expression (like the \fBed\fR utility global substitute) in
1175 \fB$0\fR or in the \fIin\fR argument, when specified.
1176 .RE
1177
1178 .sp
1179 .ne 2
1180 .na
1181 \fB\fBindex\fR(\fIs\fR,\fIt\fR)\fR
1182 .ad
1183 .sp .6
1184 .RS 4n
1185 Return the position, in characters, numbering from 1, in string \fIs\fR where
1186 string \fIt\fR first occurs, or zero if it does not occur at all.
1187 .RE
1188
1189 .sp
1190 .ne 2
1191 .na
1192 \fB\fBlength\fR[([\fIv\fR])]\fR
1193 .ad
1194 .sp .6
1195 .RS 4n
1196 Given no argument, this function returns the length of the whole record,
1197 \fB$0\fR. If given an array as an argument (and using \fB/usr/bin/awk\fR),
1198 then this returns the number of elements it contains. Otherwise, this function
1199 interprets the argument as a string (performing any needed conversions) and
1200 returns its length in characters.
1201 .RE
1202
1203 .sp
1204 .ne 2
1205 .na
1206 \fB\fBmatch\fR(\fIs\fR,\fIere\fR)\fR
1207 .ad
1208 .sp .6
1209 .RS 4n
1210 Return the position, in characters, numbering from 1, in string \fIs\fR where
1211 the extended regular expression \fIere\fR occurs, or zero if it does not occur
1212 at all. \fBRSTART\fR is set to the starting position (which is the same as the
1213 returned value), zero if no match is found; \fBRLENGTH\fR is set to the length
1214 of the matched string, \(mi1 if no match is found.
1215 .RE
1216
1217 .sp
1218 .ne 2
1219 .na
1220 \fB\fBsplit\fR(\fIs\fR,\fIa\fR[,\|\fIfs\fR])\fR
1221 .ad
1222 .sp .6
1223 .RS 4n
1224 Split the string \fIs\fR into array elements \fIa\fR[1], \fIa\fR[2],
1225 \fB\&...,\fR \fIa\fR[\fIn\fR], and return \fIn\fR. The separation is done with
1226 the extended regular expression \fIfs\fR or with the field separator \fBFS\fR
1227 if \fIfs\fR is not given. Each array element has a string value when created.
1228 If the string assigned to any array element, with any occurrence of the
1229 decimal-point character from the current locale changed to a period character,
1230 would be considered a \fInumeric string\fR; the array element also has the
1231 numeric value of the \fInumeric string\fR. The effect of a null string as the
1232 value of \fIfs\fR is unspecified.
1233 .RE
1234
1235 .sp
1236 .ne 2
1237 .na
1238 \fB\fBsprintf\fR(\fBfmt\fR,\fIexpr\fR,\fIexpr\fR,\fB\&...\fR)\fR
1239 .ad
1240 .sp .6
1241 .RS 4n
1242 Format the expressions according to the \fBprintf\fR format given by \fIfmt\fR
1243 and return the resulting string.
1244 .RE
1245
1246 .sp
1247 .ne 2
1248 .na
1249 \fB\fBsub\fR(\fIere\fR,\fIrepl\fR[,\|\fIin\fR])\fR
1250 .ad
1251 .sp .6
1252 .RS 4n
1253 Substitute the string \fIrepl\fR in place of the first instance of the extended
1254 regular expression \fBERE\fR in string in and return the number of
1255 substitutions. An ampersand ( \fB&\fR ) appearing in the string \fIrepl\fR is
1256 replaced by the string from in that matches the regular expression. An
1257 ampersand preceded with a backslash ( \fB\e\fR ) is interpreted as the literal
1258 ampersand character. An occurrence of two consecutive backslashes is
1259 interpreted as just a single literal backslash character. Any other occurrence
1260 of a backslash (for example, preceding any other character) is treated as a
1261 literal backslash character. If \fIrepl\fR is a string literal, the handling of
1262 the ampersand character occurs after any lexical processing, including any
1263 lexical backslash escape sequence processing. If \fBin\fR is specified and it
1264 is not an \fBlvalue\fR the behavior is undefined. If in is omitted, \fBawk\fR
1265 uses the current record (\fB$0\fR) in its place.
1266 .RE
1267
1268 .sp
1269 .ne 2
1270 .na
1271 \fB\fBsubstr\fR(\fIs\fR,\fIm\fR[,\|\fIn\fR])\fR
1272 .ad
1273 .sp .6
1274 .RS 4n
1275 Return the at most \fIn\fR-character substring of \fIs\fR that begins at
1276 position \fIm,\fR numbering from 1. If \fIn\fR is missing, the length of the
1277 substring is limited by the length of the string \fIs\fR.
1278 .RE
1279
1280 .sp
1281 .ne 2
1282 .na
1283 \fB\fBtolower\fR(\fIs\fR)\fR
1284 .ad
1285 .sp .6
1286 .RS 4n
1287 Return a string based on the string \fIs\fR. Each character in \fIs\fR that is
1288 an upper-case letter specified to have a \fBtolower\fR mapping by the
1289 \fBLC_CTYPE\fR category of the current locale is replaced in the returned
1290 string by the lower-case letter specified by the mapping. Other characters in
1291 \fIs\fR are unchanged in the returned string.
1292 .RE
1293
1294 .sp
1295 .ne 2
1296 .na
1297 \fB\fBtoupper\fR(\fIs\fR)\fR
1298 .ad
1299 .sp .6
1300 .RS 4n
1301 Return a string based on the string \fIs\fR. Each character in \fIs\fR that is
1302 a lower-case letter specified to have a \fBtoupper\fR mapping by the
1303 \fBLC_CTYPE\fR category of the current locale is replaced in the returned
1304 string by the upper-case letter specified by the mapping. Other characters in
1305 \fIs\fR are unchanged in the returned string.
1306 .RE
1307
1308 .sp
1309 .LP
1310 All of the preceding functions that take \fIERE\fR as a parameter expect a
1311 pattern or a string valued expression that is a regular expression as defined
1312 below.
1313
1314 .SS "Input/Output and General Functions"
1315 The input/output and general functions are:
1316 .sp
1317 .ne 2
1318 .na
1319 \fB\fBclose(\fR\fIexpression\fR)\fR
1320 .ad
1321 .RS 27n
1322 Close the file or pipe opened by a \fBprint\fR or \fBprintf\fR statement or a
1323 call to \fBgetline\fR with the same string-valued \fIexpression\fR. If the
1324 close was successful, the function returns \fB0\fR; otherwise, it returns
1325 non-zero.
1326 .RE
1327
1328 .sp
1329 .ne 2
1330 .na
1331 \fB\fBfflush(\fR\fIexpression\fR)\fR
1332 .ad
1333 .RS 27n
1334 Flush any buffered output for the file or pipe opened by a \fBprint\fR or
1335 \fBprintf\fR statement or a call to \fBgetline\fR with the same string-valued
1336 \fIexpression\fR. If the flush was successful, the function returns \fB0\fR;
1337 otherwise, it returns \fBEOF\fR. If no arguments or the empty string
1338 (\fB""\fR) are given, then all open files will be flushed. (Note that
1339 \fBfflush\fR is supported in \fB/usr/bin/awk\fR only.)
1340 .RE
1341
1342 .sp
1343 .ne 2
1344 .na
1345 \fB\fIexpression\fR|\fBgetline\fR[\fIvar\fR]\fR
1346 .ad
1347 .RS 27n
1348 Read a record of input from a stream piped from the output of a command. The
1349 stream is created if no stream is currently open with the value of
1350 \fIexpression\fR as its command name. The stream created is equivalent to one
1351 created by a call to the \fBpopen\fR function with the value of
1352 \fIexpression\fR as the \fIcommand\fR argument and a value of \fBr\fR as the
1353 \fImode\fR argument. As long as the stream remains open, subsequent calls in
1354 which \fIexpression\fR evaluates to the same string value reads subsequent
1355 records from the file. The stream remains open until the \fBclose\fR function
1356 is called with an expression that evaluates to the same string value. At that
1357 time, the stream is closed as if by a call to the \fBpclose\fR function. If
1358 \fIvar\fR is missing, \fB$0\fR and \fBNF\fR is set. Otherwise, \fIvar\fR is
1359 set.
1360 .sp
1361 The \fBgetline\fR operator can form ambiguous constructs when there are
1362 operators that are not in parentheses (including concatenate) to the left of
1363 the \fB|\fR (to the beginning of the expression containing \fBgetline\fR). In
1364 the context of the \fB$\fR operator, \fB|\fR behaves as if it had a lower
1365 precedence than \fB$\fR. The result of evaluating other operators is
1366 unspecified, and all such uses of portable applications must be put in
1367 parentheses properly.
1368 .RE
1369
1370 .sp
1371 .ne 2
1372 .na
1373 \fB\fBgetline\fR\fR
1374 .ad
1375 .RS 27n
1376 Set \fB$0\fR to the next input record from the current input file. This form of
1377 \fBgetline\fR sets the \fBNF\fR, \fBNR\fR, and \fBFNR\fR variables.
1378 .RE
1379
1380 .sp
1381 .ne 2
1382 .na
1383 \fB\fBgetline\fR \fIvar\fR\fR
1384 .ad
1385 .RS 27n
1386 Set variable \fIvar\fR to the next input record from the current input file.
1387 This form of \fBgetline\fR sets the \fBFNR\fR and \fBNR\fR variables.
1388 .RE
1389
1390 .sp
1391 .ne 2
1392 .na
1393 \fB\fBgetline\fR [\fIvar\fR] \fB<\fR \fIexpression\fR\fR
1394 .ad
1395 .RS 27n
1396 Read the next record of input from a named file. The \fIexpression\fR is
1397 evaluated to produce a string that is used as a full pathname. If the file of
1398 that name is not currently open, it is opened. As long as the stream remains
1399 open, subsequent calls in which \fIexpression\fR evaluates to the same string
1400 value reads subsequent records from the file. The file remains open until the
1401 \fBclose\fR function is called with an expression that evaluates to the same
1402 string value. If \fIvar\fR is missing, \fB$0\fR and \fBNF\fR is set. Otherwise,
1403 \fIvar\fR is set.
1404 .sp
1405 The \fBgetline\fR operator can form ambiguous constructs when there are binary
1406 operators that are not in parentheses (including concatenate) to the right of
1407 the \fB<\fR (up to the end of the expression containing the \fBgetline\fR). The
1408 result of evaluating such a construct is unspecified, and all such uses of
1409 portable applications must be put in parentheses properly.
1410 .RE
1411
1412 .sp
1413 .ne 2
1414 .na
1415 \fB\fBsystem\fR(\fIexpression\fR)\fR
1416 .ad
1417 .RS 27n
1418 Execute the command given by \fIexpression\fR in a manner equivalent to the
1419 \fBsystem\fR(3C) function and return the exit status of the command.
1420 .RE
1421
1422 .sp
1423 .LP
1424 All forms of \fBgetline\fR return \fB1\fR for successful input, \fB0\fR for end
1425 of file, and \fB\(mi1\fR for an error.
1426 .sp
1427 .LP
1428 Where strings are used as the name of a file or pipeline, the strings must be
1429 textually identical. The terminology ``same string value'' implies that
1430 ``equivalent strings'', even those that differ only by space characters,
1431 represent different files.
1432
1433 .SS "User-defined Functions"
1434 The \fBawk\fR language also provides user-defined functions. Such functions
1435 can be defined as:
1436 .sp
1437 .in +2
1438 .nf
1439 \fBfunction\fR \fIname\fR(\fIargs\fR,\|.\|.\|.) { \fIstatements\fR }
1440 .fi
1441 .in -2
1442
1443 .sp
1444 .LP
1445 A function can be referred to anywhere in an \fBawk\fR program; in particular,
1446 its use can precede its definition. The scope of a function is global.
1447 .sp
1448 .LP
1449 Function arguments can be either scalars or arrays; the behavior is undefined
1450 if an array name is passed as an argument that the function uses as a scalar,
1451 or if a scalar expression is passed as an argument that the function uses as an
1452 array. Function arguments are passed by value if scalar and by reference if
1453 array name. Argument names are local to the function; all other variable names
1454 are global. The same name is not used as both an argument name and as the name
1455 of a function or a special \fBawk\fR variable. The same name must not be used
1456 both as a variable name with global scope and as the name of a function. The
1457 same name must not be used within the same scope both as a scalar variable and
1458 as an array.
1459 .sp
1460 .LP
1461 The number of parameters in the function definition need not match the number
1462 of parameters in the function call. Excess formal parameters can be used as
1463 local variables. If fewer arguments are supplied in a function call than are in
1464 the function definition, the extra parameters that are used in the function
1465 body as scalars are initialized with a string value of the null string and a
1466 numeric value of zero, and the extra parameters that are used in the function
1467 body as arrays are initialized as empty arrays. If more arguments are supplied
1468 in a function call than are in the function definition, the behavior is
1469 undefined.
1470 .sp
1471 .LP
1472 When invoking a function, no white space can be placed between the function
1473 name and the opening parenthesis. Function calls can be nested and recursive
1474 calls can be made upon functions. Upon return from any nested or recursive
1475 function call, the values of all of the calling function's parameters are
1476 unchanged, except for array parameters passed by reference. The \fBreturn\fR
1477 statement can be used to return a value. If a \fBreturn\fR statement appears
1478 outside of a function definition, the behavior is undefined.
1479 .sp
1480 .LP
1481 In the function definition, newline characters are optional before the opening
1482 brace and after the closing brace. Function definitions can appear anywhere in
1483 the program where a \fIpattern-action\fR pair is allowed.
1484
1485 .SH USAGE
1486 The \fBindex\fR, \fBlength\fR, \fBmatch\fR, and \fBsubstr\fR functions should
1487 not be confused with similar functions in the \fBISO C\fR standard; the
1488 \fBawk\fR versions deal with characters, while the \fBISO C\fR standard deals
1489 with bytes.
1490 .sp
1491 .LP
1492 Because the concatenation operation is represented by adjacent expressions
1493 rather than an explicit operator, it is often necessary to use parentheses to
1494 enforce the proper evaluation precedence.
1495 .sp
1496 .LP
1497 See \fBlargefile\fR(5) for the description of the behavior of \fBawk\fR when
1498 encountering files greater than or equal to 2 Gbyte (2^31 bytes).
1499
1500 .SH EXAMPLES
1501 The \fBawk\fR program specified in the command line is most easily specified
1502 within single-quotes (for example, \fB\&'\fR\fIprogram\fR\fB\&'\fR) for
1503 applications using \fBsh\fR, because \fBawk\fR programs commonly contain
1504 characters that are special to the shell, including double-quotes. In the cases
1505 where a \fBawk\fR program contains single-quote characters, it is usually
1506 easiest to specify most of the program as strings within single-quotes
1507 concatenated by the shell with quoted single-quote characters. For example:
1508 .sp
1509 .in +2
1510 .nf
1511 awk '/'\e''/ { print "quote:", $0 }'
1512 .fi
1513 .in -2
1514
1515 .sp
1516 .LP
1517 prints all lines from the standard input containing a single-quote character,
1518 prefixed with \fBquote:\fR.
1519 .sp
1520 .LP
1521 The following are examples of simple \fBawk\fR programs:
1522 .LP
1523 \fBExample 1 \fRWrite to the standard output all input lines for which field 3
1524 is greater than 5:
1525 .sp
1526 .in +2
1527 .nf
1528 \fB$3 > 5\fR
1529 .fi
1530 .in -2
1531 .sp
1532
1533 .LP
1534 \fBExample 2 \fRWrite every tenth line:
1535 .sp
1536 .in +2
1537 .nf
1538 \fB(NR % 10) == 0\fR
1539 .fi
1540 .in -2
1541 .sp
1542
1543 .LP
1544 \fBExample 3 \fRWrite any line with a substring matching the regular
1545 expression:
1546 .sp
1547 .in +2
1548 .nf
1549 \fB/(G|D)(2[0-9][[:alpha:]]*)/\fR
1550 .fi
1551 .in -2
1552 .sp
1553
1554 .LP
1555 \fBExample 4 \fRPrint any line with a substring containing a G or D, followed
1556 by a sequence of digits and characters:
1557 .sp
1558 .LP
1559 This example uses character classes \fBdigit\fR and \fBalpha\fR to match
1560 language-independent digit and alphabetic characters, respectively.
1561
1562 .sp
1563 .in +2
1564 .nf
1565 \fB/(G|D)([[:digit:][:alpha:]]*)/\fR
1566 .fi
1567 .in -2
1568 .sp
1569
1570 .LP
1571 \fBExample 5 \fRWrite any line in which the second field matches the regular
1572 expression and the fourth field does not:
1573 .sp
1574 .in +2
1575 .nf
1576 \fB$2 ~ /xyz/ && $4 !~ /xyz/\fR
1577 .fi
1578 .in -2
1579 .sp
1580
1581 .LP
1582 \fBExample 6 \fRWrite any line in which the second field contains a backslash:
1583 .sp
1584 .in +2
1585 .nf
1586 \fB$2 ~ /\e\e/\fR
1587 .fi
1588 .in -2
1589 .sp
1590
1591 .LP
1592 \fBExample 7 \fRWrite any line in which the second field contains a backslash
1593 (alternate method):
1594 .sp
1595 .LP
1596 Notice that backslash escapes are interpreted twice, once in lexical processing
1597 of the string and once in processing the regular expression.
1598
1599 .sp
1600 .in +2
1601 .nf
1602 \fB$2 ~ "\e\e\e\e"\fR
1603 .fi
1604 .in -2
1605 .sp
1606
1607 .LP
1608 \fBExample 8 \fRWrite the second to the last and the last field in each line,
1609 separating the fields by a colon:
1610 .sp
1611 .in +2
1612 .nf
1613 \fB{OFS=":";print $(NF-1), $NF}\fR
1614 .fi
1615 .in -2
1616 .sp
1617
1618 .LP
1619 \fBExample 9 \fRWrite the line number and number of fields in each line:
1620 .sp
1621 .LP
1622 The three strings representing the line number, the colon and the number of
1623 fields are concatenated and that string is written to standard output.
1624
1625 .sp
1626 .in +2
1627 .nf
1628 \fB{print NR ":" NF}\fR
1629 .fi
1630 .in -2
1631 .sp
1632
1633 .LP
1634 \fBExample 10 \fRWrite lines longer than 72 characters:
1635 .sp
1636 .in +2
1637 .nf
1638 \fB{length($0) > 72}\fR
1639 .fi
1640 .in -2
1641 .sp
1642
1643 .LP
1644 \fBExample 11 \fRWrite first two fields in opposite order separated by the OFS:
1645 .sp
1646 .in +2
1647 .nf
1648 \fB{ print $2, $1 }\fR
1649 .fi
1650 .in -2
1651 .sp
1652
1653 .LP
1654 \fBExample 12 \fRSame, with input fields separated by comma or space and tab
1655 characters, or both:
1656 .sp
1657 .in +2
1658 .nf
1659 \fBBEGIN { FS = ",[\et]*|[\et]+" }
1660 { print $2, $1 }\fR
1661 .fi
1662 .in -2
1663 .sp
1664
1665 .LP
1666 \fBExample 13 \fRAdd up first column, print sum and average:
1667 .sp
1668 .in +2
1669 .nf
1670 \fB{s += $1 }
1671 END {print "sum is ", s, " average is", s/NR}\fR
1672 .fi
1673 .in -2
1674 .sp
1675
1676 .LP
1677 \fBExample 14 \fRWrite fields in reverse order, one per line (many lines out
1678 for each line in):
1679 .sp
1680 .in +2
1681 .nf
1682 \fB{ for (i = NF; i > 0; --i) print $i }\fR
1683 .fi
1684 .in -2
1685 .sp
1686
1687 .LP
1688 \fBExample 15 \fRWrite all lines between occurrences of the strings "start" and
1689 "stop":
1690 .sp
1691 .in +2
1692 .nf
1693 \fB/start/, /stop/\fR
1694 .fi
1695 .in -2
1696 .sp
1697
1698 .LP
1699 \fBExample 16 \fRWrite all lines whose first field is different from the
1700 previous one:
1701 .sp
1702 .in +2
1703 .nf
1704 \fB$1 != prev { print; prev = $1 }\fR
1705 .fi
1706 .in -2
1707 .sp
1708
1709 .LP
1710 \fBExample 17 \fRSimulate the echo command:
1711 .sp
1712 .in +2
1713 .nf
1714 \fBBEGIN {
1715 for (i = 1; i < ARGC; ++i)
1716 printf "%s%s", ARGV[i], i==ARGC-1?"\en":""
1717 }\fR
1718 .fi
1719 .in -2
1720 .sp
1721
1722 .LP
1723 \fBExample 18 \fRWrite the path prefixes contained in the PATH environment
1724 variable, one per line:
1725 .sp
1726 .in +2
1727 .nf
1728 \fBBEGIN {
1729 n = split (ENVIRON["PATH"], path, ":")
1730 for (i = 1; i <= n; ++i)
1731 print path[i]
1732 }\fR
1733 .fi
1734 .in -2
1735 .sp
1736
1737 .LP
1738 \fBExample 19 \fRPrint the file "input", filling in page numbers starting at 5:
1739 .sp
1740 .LP
1741 If there is a file named \fBinput\fR containing page headers of the form
1742
1743 .sp
1744 .in +2
1745 .nf
1746 Page#
1747 .fi
1748 .in -2
1749
1750 .sp
1751 .LP
1752 and a file named \fBprogram\fR that contains
1753
1754 .sp
1755 .in +2
1756 .nf
1757 /Page/{ $2 = n++; }
1758 { print }
1759 .fi
1760 .in -2
1761
1762 .sp
1763 .LP
1764 then the command line
1765
1766 .sp
1767 .in +2
1768 .nf
1769 \fBawk -f program n=5 input\fR
1770 .fi
1771 .in -2
1772 .sp
1773
1774 .sp
1775 .LP
1776 prints the file \fBinput\fR, filling in page numbers starting at 5.
1777
1778 .SH ENVIRONMENT VARIABLES
1779 See \fBenviron\fR(5) for descriptions of the following environment variables
1780 that affect execution: \fBLC_COLLATE\fR, \fBLC_CTYPE\fR, \fBLC_MESSAGES\fR, and
1781 \fBNLSPATH\fR.
1782 .sp
1783 .ne 2
1784 .na
1785 \fB\fBLC_NUMERIC\fR\fR
1786 .ad
1787 .RS 14n
1788 Determine the radix character used when interpreting numeric input, performing
1789 conversions between numeric and string values and formatting numeric output.
1790 Regardless of locale, the period character (the decimal-point character of the
1791 POSIX locale) is the decimal-point character recognized in processing \fBawk\fR
1792 programs (including assignments in command-line arguments).
1793 .RE
1794
1795 .SH EXIT STATUS
1796 The following exit values are returned:
1797 .sp
1798 .ne 2
1799 .na
1800 \fB\fB0\fR\fR
1801 .ad
1802 .RS 6n
1803 All input files were processed successfully.
1804 .RE
1805
1806 .sp
1807 .ne 2
1808 .na
1809 \fB\fB>0\fR\fR
1810 .ad
1811 .RS 6n
1812 An error occurred.
1813 .RE
1814
1815 .sp
1816 .LP
1817 The exit status can be altered within the program by using an \fBexit\fR
1818 expression.
1819
1820 .SH SEE ALSO
1821 \fBed\fR(1), \fBegrep\fR(1), \fBgrep\fR(1), \fBlex\fR(1), \fBoawk\fR(1),
1822 \fBsed\fR(1), \fBpopen\fR(3C), \fBprintf\fR(3C), \fBsystem\fR(3C),
1823 \fBattributes\fR(5), \fBenviron\fR(5), \fBlargefile\fR(5), \fBregex\fR(5),
1824 \fBXPG4\fR(5)
1825 .sp
1826 .LP
1827 Aho, A. V., B. W. Kernighan, and P. J. Weinberger, \fIThe AWK Programming
1828 Language\fR, Addison-Wesley, 1988.
1829
1830 .SH DIAGNOSTICS
1831 If any \fIfile\fR operand is specified and the named file cannot be accessed,
1832 \fBawk\fR writes a diagnostic message to standard error and terminate without
1833 any further action.
1834 .sp
1835 .LP
1836 If the program specified by either the \fIprogram\fR operand or a
1837 \fIprogfile\fR operand is not a valid \fBawk\fR program (as specified in
1838 \fBEXTENDED DESCRIPTION\fR), the behavior is undefined.
1839
1840 .SH NOTES
1841 Input white space is not preserved on output if fields are involved.
1842 .sp
1843 .LP
1844 There are no explicit conversions between numbers and strings. To force an
1845 expression to be treated as a number add 0 to it; to force it to be treated as
1846 a string concatenate the null string (\fB""\fR) to it.