Print this page
12482 Have /usr/bin/awk point to /usr/bin/nawk
Reviewed by: Peter Tribble <peter.tribble@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
| Split |
Close |
| Expand all |
| Collapse all |
--- old/usr/src/man/man1/awk.1
+++ new/usr/src/man/man1/awk.1
1 1 .\"
2 2 .\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
3 3 .\" permission to reproduce portions of its copyrighted documentation.
4 4 .\" Original documentation from The Open Group can be obtained online at
5 5 .\" http://www.opengroup.org/bookstore/.
6 6 .\"
7 7 .\" The Institute of Electrical and Electronics Engineers and The Open
8 8 .\" Group, have given us permission to reprint portions of their
9 9 .\" documentation.
10 10 .\"
11 11 .\" In the following statement, the phrase ``this text'' refers to portions
12 12 .\" of the system documentation.
13 13 .\"
14 14 .\" Portions of this text are reprinted and reproduced in electronic form
15 15 .\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
16 16 .\" Standard for Information Technology -- Portable Operating System
17 17 .\" Interface (POSIX), The Open Group Base Specifications Issue 6,
18 18 .\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
19 19 .\" Engineers, Inc and The Open Group. In the event of any discrepancy
20 20 .\" between these versions and the original IEEE and The Open Group
21 21 .\" Standard, the original IEEE and The Open Group Standard is the referee
22 22 .\" document. The original Standard can be obtained online at
23 23 .\" http://www.opengroup.org/unix/online.html.
24 24 .\"
25 25 .\" This notice shall appear on any product containing this material.
26 26 .\"
27 27 .\" The contents of this file are subject to the terms of the
28 28 .\" Common Development and Distribution License (the "License").
29 29 .\" You may not use this file except in compliance with the License.
30 30 .\"
31 31 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
32 32 .\" or http://www.opensolaris.org/os/licensing.
33 33 .\" See the License for the specific language governing permissions
|
↓ open down ↓ |
33 lines elided |
↑ open up ↑ |
34 34 .\" and limitations under the License.
35 35 .\"
36 36 .\" When distributing Covered Code, include this CDDL HEADER in each
37 37 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
38 38 .\" If applicable, add the following below this CDDL HEADER, with the
39 39 .\" fields enclosed by brackets "[]" replaced with your own identifying
40 40 .\" information: Portions Copyright [yyyy] [name of copyright owner]
41 41 .\"
42 42 .\"
43 43 .\" Copyright 1989 AT&T
44 -.\" Portions Copyright (c) 1992, X/Open Company Limited. All Rights Reserved.
45 -.\" Copyright (c) 2005, Sun Microsystems, Inc. All Rights Reserved
44 +.\" Copyright 1992, X/Open Company Limited All Rights Reserved
45 +.\" Portions Copyright (c) 2005, 2006 Sun Microsystems, Inc. All Rights Reserved
46 +.\" Copyright 2020 Joyent, Inc.
46 47 .\"
47 -.TH AWK 1 "Jun 22, 2005"
48 +.TH AWK 1 "Apr 20, 2020"
48 49 .SH NAME
49 50 awk \- pattern scanning and processing language
50 51 .SH SYNOPSIS
52 +.nf
53 +\fB/usr/bin/awk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR] \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
54 + [\fIargument\fR]...
55 +.fi
56 +
51 57 .LP
52 58 .nf
53 -\fB/usr/bin/awk\fR [\fB-f\fR \fIprogfile\fR] [\fB-F\fIc\fR\fR] [' \fIprog\fR '] [\fIparameters\fR]
54 - [\fIfilename\fR]...
59 +\fB/usr/bin/nawk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR] \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
60 + [\fIargument\fR]...
55 61 .fi
56 62
57 63 .LP
58 64 .nf
59 -\fB/usr/xpg4/bin/awk\fR [\fB-F\fR\fIcERE\fR] [\fB-v\fR \fIassignment\fR]... \fI\&'program'\fR \fB-f\fR \fIprogfile\fR...
65 +\fB/usr/xpg4/bin/awk\fR [\fB-F\fR \fIERE\fR] [\fB-v\fR \fIassignment\fR]... \fI\&'program'\fR | \fB-f\fR \fIprogfile\fR...
60 66 [\fIargument\fR]...
61 67 .fi
62 68
63 69 .SH DESCRIPTION
64 -.sp
70 +NOTE: The \fBnawk\fR command is now the system default awk for illumos.
65 71 .LP
66 -The \fB/usr/xpg4/bin/awk\fR utility is described on the \fBnawk\fR(1) manual
67 -page.
72 +The \fB/usr/bin/awk\fR and \fB/usr/xpg4/bin/awk\fR utilities execute
73 +\fIprogram\fRs written in the \fBawk\fR programming language, which is
74 +specialized for textual data manipulation. A \fBawk\fR \fIprogram\fR is a
75 +sequence of patterns and corresponding actions. The string specifying
76 +\fIprogram\fR must be enclosed in single quotes (') to protect it from
77 +interpretation by the shell. The sequence of pattern - action statements can be
78 +specified in the command line as \fIprogram\fR or in one, or more, file(s)
79 +specified by the \fB-f\fR\fIprogfile\fR option. When input is read that matches
80 +a pattern, the action associated with the pattern is performed.
68 81 .sp
69 82 .LP
70 -The \fB/usr/bin/awk\fR utility scans each input \fIfilename\fR for lines that
71 -match any of a set of patterns specified in \fIprog\fR. The \fIprog\fR string
72 -must be enclosed in single quotes (\fB a\'\fR) to protect it from the shell.
73 -For each pattern in \fIprog\fR there can be an associated action performed when
74 -a line of a \fIfilename\fR matches the pattern. The set of pattern-action
75 -statements can appear literally as \fIprog\fR or in a file specified with the
76 -\fB-f\fR\fI progfile\fR option. Input files are read in order; if there are no
77 -files, the standard input is read. The file name \fB\&'\(mi'\fR means the
78 -standard input.
79 -.SH OPTIONS
83 +Input is interpreted as a sequence of records. By default, a record is a line,
84 +but this can be changed by using the \fBRS\fR built-in variable. Each record of
85 +input is matched to each pattern in the \fIprogram\fR. For each pattern
86 +matched, the associated action is executed.
80 87 .sp
81 88 .LP
89 +The \fBawk\fR utility interprets each input record as a sequence of fields
90 +where, by default, a field is a string of non-blank characters. This default
91 +white-space field delimiter (blanks and/or tabs) can be changed by using the
92 +\fBFS\fR built-in variable or the \fB-F\fR\fIERE\fR option. The \fBawk\fR
93 +utility denotes the first field in a record \fB$1\fR, the second \fB$2\fR, and
94 +so forth. The symbol \fB$0\fR refers to the entire record; setting any other
95 +field causes the reevaluation of \fB$0\fR. Assigning to \fB$0\fR resets the
96 +values of all fields and the \fBNF\fR built-in variable.
97 +
98 +.SH OPTIONS
82 99 The following options are supported:
83 100 .sp
84 101 .ne 2
85 102 .na
86 -\fB\fB-f\fR\fI progfile\fR \fR
103 +\fB\fB-F\fR \fIERE\fR\fR
87 104 .ad
88 -.RS 16n
89 -\fBawk\fR uses the set of patterns it reads from \fIprogfile\fR.
105 +.RS 17n
106 +Define the input field separator to be the extended regular expression
107 +\fIERE\fR, before any input is read (can be a character).
90 108 .RE
91 109
92 110 .sp
93 111 .ne 2
94 112 .na
95 -\fB\fB-F\fR\fIc\fR \fR
113 +\fB\fB-f\fR \fIprogfile\fR\fR
96 114 .ad
97 -.RS 16n
98 -Uses the character \fIc\fR as the field separator (FS) character. See the
99 -discussion of \fBFS\fR below.
115 +.RS 17n
116 +Specifies the pathname of the file \fIprogfile\fR containing a \fBawk\fR
117 +program. If multiple instances of this option are specified, the concatenation
118 +of the files specified as \fIprogfile\fR in the order specified is the
119 +\fBawk\fR program. The \fBawk\fR program can alternatively be specified in
120 +the command line as a single argument.
100 121 .RE
101 122
102 -.SH USAGE
103 -.SS "Input Lines"
104 123 .sp
124 +.ne 2
125 +.na
126 +\fB\fB-v\fR \fIassignment\fR\fR
127 +.ad
128 +.RS 17n
129 +The \fIassignment\fR argument must be in the same form as an \fIassignment\fR
130 +operand. The assignment is of the form \fIvar=value\fR, where \fIvar\fR is the
131 +name of one of the variables described below. The specified assignment occurs
132 +before executing the \fBawk\fR program, including the actions associated with
133 +\fBBEGIN\fR patterns (if any). Multiple occurrences of this option can be
134 +specified.
135 +.RE
136 +
137 +.sp
138 +.ne 2
139 +.na
140 +\fB\fB-safe\fR\fR
141 +.ad
142 +.RS 17n
143 +When passed to \fBawk\fR, this flag will prevent the program from opening new
144 +files or running child processes. The \fBENVIRON\fR array will also not be
145 +initialized.
146 +.RE
147 +
148 +.SH OPERANDS
149 +The following operands are supported:
150 +.sp
151 +.ne 2
152 +.na
153 +\fB\fIprogram\fR\fR
154 +.ad
155 +.RS 12n
156 +If no \fB-f\fR option is specified, the first operand to \fBawk\fR is the text
157 +of the \fBawk\fR program. The application supplies the \fIprogram\fR operand
158 +as a single argument to \fBawk.\fR If the text does not end in a newline
159 +character, \fBawk\fR interprets the text as if it did.
160 +.RE
161 +
162 +.sp
163 +.ne 2
164 +.na
165 +\fB\fIargument\fR\fR
166 +.ad
167 +.RS 12n
168 +Either of the following two types of \fIargument\fR can be intermixed:
169 +.sp
170 +.ne 2
171 +.na
172 +\fB\fIfile\fR\fR
173 +.ad
174 +.RS 14n
175 +A pathname of a file that contains the input to be read, which is matched
176 +against the set of patterns in the program. If no \fIfile\fR operands are
177 +specified, or if a \fIfile\fR operand is \fB\(mi\fR, the standard input is
178 +used.
179 +.RE
180 +
181 +.sp
182 +.ne 2
183 +.na
184 +\fB\fIassignment\fR\fR
185 +.ad
186 +.RS 14n
187 +An operand that begins with an underscore or alphabetic character from the
188 +portable character set, followed by a sequence of underscores, digits and
189 +alphabetics from the portable character set, followed by the \fB=\fR character
190 +specifies a variable assignment rather than a pathname. The characters before
191 +the \fB=\fR represent the name of a \fBawk\fR variable. If that name is a
192 +\fBawk\fR reserved word, the behavior is undefined. The characters following
193 +the equal sign is interpreted as if they appeared in the \fBawk\fR program
194 +preceded and followed by a double-quote (\fB"\fR) character, as a \fBSTRING\fR
195 +token , except that if the last character is an unescaped backslash, it is
196 +interpreted as a literal backslash rather than as the first character of the
197 +sequence \fB\e\fR\&.. The variable is assigned the value of that \fBSTRING\fR
198 +token. If the value is considered a \fInumeric\fRstring\fI,\fR the variable is
199 +assigned its numeric value. Each such variable assignment is performed just
200 +before the processing of the following \fIfile\fR, if any. Thus, an assignment
201 +before the first \fBfile\fR argument is executed after the \fBBEGIN\fR actions
202 +(if any), while an assignment after the last \fIfile\fR argument is executed
203 +before the \fBEND\fR actions (if any). If there are no \fIfile\fR arguments,
204 +assignments are executed before processing the standard input.
205 +.RE
206 +
207 +.RE
208 +
209 +.SH INPUT FILES
210 +Input files to the \fBawk\fR program from any of the following sources:
211 +.RS +4
212 +.TP
213 +.ie t \(bu
214 +.el o
215 +any \fIfile\fR operands or their equivalents, achieved by modifying the
216 +\fBawk\fR variables \fBARGV\fR and \fBARGC\fR
217 +.RE
218 +.RS +4
219 +.TP
220 +.ie t \(bu
221 +.el o
222 +standard input in the absence of any \fIfile\fR operands
223 +.RE
224 +.RS +4
225 +.TP
226 +.ie t \(bu
227 +.el o
228 +arguments to the \fBgetline\fR function
229 +.RE
230 +.sp
105 231 .LP
106 -Each input line is matched against the pattern portion of every pattern-action
107 -statement; the associated action is performed for each matched pattern. Any
108 -\fIfilename\fR of the form \fIvar=value\fR is treated as an assignment, not a
109 -filename, and is executed at the time it would have been opened if it were a
110 -filename. \fIVariables\fR assigned in this manner are not available inside a
111 -\fBBEGIN\fR rule, and are assigned after previously specified files have been
112 -read.
232 +must be text files. Whether the variable \fBRS\fR is set to a value other than
233 +a newline character or not, for these files, implementations support records
234 +terminated with the specified separator up to \fB{LINE_MAX}\fR bytes and can
235 +support longer records.
113 236 .sp
114 237 .LP
115 -An input line is normally made up of fields separated by white spaces. (This
116 -default can be changed by using the \fBFS\fR built-in variable or the
117 -\fB-F\fR\fIc\fR option.) The default is to ignore leading blanks and to
118 -separate fields by blanks and/or tab characters. However, if \fBFS\fR is
119 -assigned a value that does not include any of the white spaces, then leading
120 -blanks are not ignored. The fields are denoted \fB$1\fR, \fB$2\fR,
121 -\fB\&.\|.\|.\fR\|; \fB$0\fR refers to the entire line.
122 -.SS "Pattern-action Statements"
238 +If \fB-\fR\fBf\fR \fIprogfile\fR is specified, the files named by each of the
239 +\fIprogfile\fR option-arguments must be text files containing an \fBawk\fR
240 +program.
123 241 .sp
124 242 .LP
125 -A pattern-action statement has the form:
243 +The standard input are used only if no \fIfile\fR operands are specified, or if
244 +a \fIfile\fR operand is \fB\(mi\fR\&.
245 +
246 +.SH EXTENDED DESCRIPTION
247 +A \fBawk\fR program is composed of pairs of the form:
126 248 .sp
127 249 .in +2
128 250 .nf
129 -\fIpattern\fR\fB { \fR\fIaction\fR\fB } \fR
251 +pattern { \fIaction\fR }
130 252 .fi
131 253 .in -2
132 -.sp
133 254
134 255 .sp
135 256 .LP
136 -Either pattern or action can be omitted. If there is no action, the matching
137 -line is printed. If there is no pattern, the action is performed on every input
138 -line. Pattern-action statements are separated by newlines or semicolons.
257 +Either the pattern or the action (including the enclosing brace characters) can
258 +be omitted. Pattern-action statements are separated by a semicolon or by a
259 +newline.
139 260 .sp
140 261 .LP
141 -Patterns are arbitrary Boolean combinations ( \fB!\fR, ||, \fB&&\fR, and
142 -parentheses) of relational expressions and regular expressions. A relational
143 -expression is one of the following:
262 +A missing pattern matches any record of input, and a missing action is
263 +equivalent to an action that writes the matched record of input to standard
264 +output.
144 265 .sp
266 +.LP
267 +Execution of the \fBawk\fR program starts by first executing the actions
268 +associated with all \fBBEGIN\fR patterns in the order they occur in the
269 +program. Then each \fIfile\fR operand (or standard input if no files were
270 +specified) is processed by reading data from the file until a record separator
271 +is seen (a newline character by default), splitting the current record into
272 +fields using the current value of \fBFS\fR, evaluating each pattern in the
273 +program in the order of occurrence, and executing the action associated with
274 +each pattern that matches the current record. The action for a matching pattern
275 +is executed before evaluating subsequent patterns. Last, the actions associated
276 +with all \fBEND\fR patterns is executed in the order they occur in the program.
277 +
278 +.SS "Expressions in awk"
279 +Expressions describe computations used in \fIpatterns\fR and \fIactions\fR. In
280 +the following table, valid expression operations are given in groups from
281 +highest precedence first to lowest precedence last, with equal-precedence
282 +operators grouped between horizontal lines. In expression evaluation, where the
283 +grammar is formally ambiguous, higher precedence operators are evaluated before
284 +lower precedence operators. In this table \fIexpr,\fR \fIexpr1,\fR
285 +\fIexpr2,\fR and \fIexpr3\fR represent any expression, while \fIlvalue\fR
286 +represents any entity that can be assigned to (that is, on the left side of an
287 +assignment operator).
288 +.sp
289 +
290 +.sp
291 +.TS
292 +c c c c
293 +l l l l .
294 +\fBSyntax\fR \fBName\fR \fBType of Result\fR \fBAssociativity\fR
295 +_
296 +( \fIexpr\fR ) Grouping type of \fIexpr\fR n/a
297 +_
298 +$\fIexpr\fR Field reference string n/a
299 +_
300 +++ \fIlvalue\fR Pre-increment numeric n/a
301 +\(mi\(mi \fIlvalue\fR Pre-decrement numeric n/a
302 +\fIlvalue\fR ++ Post-increment numeric n/a
303 +\fIlvalue\fR \(mi\(mi Post-decrement numeric n/a
304 +_
305 +\fIexpr\fR ^ \fIexpr\fR Exponentiation numeric right
306 +_
307 +! \fIexpr\fR Logical not numeric n/a
308 ++ \fIexpr\fR Unary plus numeric n/a
309 +\(mi \fIexpr\fR Unary minus numeric n/a
310 +_
311 +\fIexpr\fR * \fIexpr\fR Multiplication numeric left
312 +\fIexpr\fR / \fIexpr\fR Division numeric left
313 +\fIexpr\fR % \fIexpr\fR Modulus numeric left
314 +_
315 +\fIexpr\fR + \fIexpr\fR Addition numeric left
316 +\fIexpr\fR \(mi \fIexpr\fR Subtraction numeric left
317 +_
318 +\fIexpr\fR \fIexpr\fR String concatenation string left
319 +_
320 +\fIexpr\fR < \fIexpr\fR Less than numeric none
321 +\fIexpr\fR <= \fIexpr\fR Less than or equal to numeric none
322 +\fIexpr\fR != \fIexpr\fR Not equal to numeric none
323 +\fIexpr\fR == \fIexpr\fR Equal to numeric none
324 +\fIexpr\fR > \fIexpr\fR Greater than numeric none
325 +\fIexpr\fR >= \fIexpr\fR Greater than or equal to numeric none
326 +_
327 +\fIexpr\fR ~ \fIexpr\fR ERE match numeric none
328 +\fIexpr\fR !~ \fIexpr\fR ERE non-match numeric none
329 +_
330 +\fIexpr\fR in array Array membership numeric left
331 +( \fIindex\fR ) in Multi-dimension array numeric left
332 + \fIarray\fR membership
333 +_
334 +\fBexpr\fR && \fIexpr\fR Logical AND numeric left
335 +_
336 +\fBexpr\fR |\|| \fIexpr\fR Logical OR numeric left
337 +_
338 +\fIexpr1\fR ? \fIexpr2\fR Conditional expression type of selected right
339 + : \fIexpr3\fR \fIexpr2\fR or \fIexpr3\fR
340 +_
341 +\fIlvalue\fR ^= \fIexpr\fR Exponentiation numeric right
342 + assignment
343 +\fIlvalue\fR %= \fIexpr\fR Modulus assignment numeric right
344 +\fIlvalue\fR *= \fIexpr\fR Multiplication numeric right
345 + assignment
346 +\fIlvalue\fR /= \fIexpr\fR Division assignment numeric right
347 +\fIlvalue\fR += \fIexpr\fR Addition assignment numeric right
348 +\fIlvalue\fR \(mi= \fIexpr\fR Subtraction assignment numeric right
349 +\fIlvalue\fR = \fIexpr\fR Assignment type of \fIexpr\fR right
350 +.TE
351 +
352 +.sp
353 +.LP
354 +Each expression has either a string value, a numeric value or both. Except as
355 +stated for specific contexts, the value of an expression is implicitly
356 +converted to the type needed for the context in which it is used. A string
357 +value is converted to a numeric value by the equivalent of the following calls:
358 +.sp
145 359 .in +2
146 360 .nf
147 -\fIexpression relop expression
148 -expression matchop regular_expression\fR
361 +setlocale(LC_NUMERIC, "");
362 +\fInumeric_value\fR = atof(\fIstring_value\fR);
149 363 .fi
150 364 .in -2
151 365
152 366 .sp
153 367 .LP
154 -where a \fIrelop\fR is any of the six relational operators in C, and a
155 -\fImatchop\fR is either \fB~\fR (contains) or \fB!~\fR (does not contain). An
156 -\fIexpression\fR is an arithmetic expression, a relational expression, the
157 -special expression
368 +A numeric value that is exactly equal to the value of an integer is converted
369 +to a string by the equivalent of a call to the \fBsprintf\fR function with the
370 +string \fB%d\fR as the \fBfmt\fR argument and the numeric value being converted
371 +as the first and only \fIexpr\fR argument. Any other numeric value is
372 +converted to a string by the equivalent of a call to the \fBsprintf\fR function
373 +with the value of the variable \fBCONVFMT\fR as the \fBfmt\fR argument and the
374 +numeric value being converted as the first and only \fIexpr\fR argument.
158 375 .sp
376 +.LP
377 +A string value is considered to be a \fInumeric string\fR in the following
378 +case:
379 +.RS +4
380 +.TP
381 +1.
382 +Any leading and trailing blank characters is ignored.
383 +.RE
384 +.RS +4
385 +.TP
386 +2.
387 +If the first unignored character is a \fB+\fR or \fB\(mi\fR, it is ignored.
388 +.RE
389 +.RS +4
390 +.TP
391 +3.
392 +If the remaining unignored characters would be lexically recognized as a
393 +\fBNUMBER\fR token, the string is considered a \fInumeric string\fR.
394 +.RE
395 +.sp
396 +.LP
397 +If a \fB\(mi\fR character is ignored in the above steps, the numeric value of
398 +the \fInumeric string\fR is the negation of the numeric value of the recognized
399 +\fBNUMBER\fR token. Otherwise the numeric value of the \fInumeric string\fR is
400 +the numeric value of the recognized \fBNUMBER\fR token. Whether or not a string
401 +is a \fInumeric string\fR is relevant only in contexts where that term is used
402 +in this section.
403 +.sp
404 +.LP
405 +When an expression is used in a Boolean context, if it has a numeric value, a
406 +value of zero is treated as false and any other value is treated as true.
407 +Otherwise, a string value of the null string is treated as false and any other
408 +value is treated as true. A Boolean context is one of the following:
409 +.RS +4
410 +.TP
411 +.ie t \(bu
412 +.el o
413 +the first subexpression of a conditional expression.
414 +.RE
415 +.RS +4
416 +.TP
417 +.ie t \(bu
418 +.el o
419 +an expression operated on by logical NOT, logical \fBAND,\fR or logical OR.
420 +.RE
421 +.RS +4
422 +.TP
423 +.ie t \(bu
424 +.el o
425 +the second expression of a \fBfor\fR statement.
426 +.RE
427 +.RS +4
428 +.TP
429 +.ie t \(bu
430 +.el o
431 +the expression of an \fBif\fR statement.
432 +.RE
433 +.RS +4
434 +.TP
435 +.ie t \(bu
436 +.el o
437 +the expression of the \fBwhile\fR clause in either a \fBwhile\fR or \fBdo\fR
438 +\fB\&.\|.\|.\fR \fBwhile\fR statement.
439 +.RE
440 +.RS +4
441 +.TP
442 +.ie t \(bu
443 +.el o
444 +an expression used as a pattern (as in Overall Program Structure).
445 +.RE
446 +.sp
447 +.LP
448 +The \fBawk\fR language supplies arrays that are used for storing numbers or
449 +strings. Arrays need not be declared. They are initially empty, and their sizes
450 +changes dynamically. The subscripts, or element identifiers, are strings,
451 +providing a type of associative array capability. An array name followed by a
452 +subscript within square brackets can be used as an \fIlvalue\fR and as an
453 +expression, as described in the grammar. Unsubscripted array names are used in
454 +only the following contexts:
455 +.RS +4
456 +.TP
457 +.ie t \(bu
458 +.el o
459 +a parameter in a function definition or function call.
460 +.RE
461 +.RS +4
462 +.TP
463 +.ie t \(bu
464 +.el o
465 +the \fBNAME\fR token following any use of the keyword \fBin\fR.
466 +.RE
467 +.sp
468 +.LP
469 +A valid array \fIindex\fR consists of one or more comma-separated expressions,
470 +similar to the way in which multi-dimensional arrays are indexed in some
471 +programming languages. Because \fBawk\fR arrays are really one-dimensional,
472 +such a comma-separated list is converted to a single string by concatenating
473 +the string values of the separate expressions, each separated from the other by
474 +the value of the \fBSUBSEP\fR variable.
475 +.sp
476 +.LP
477 +Thus, the following two index operations are equivalent:
478 +.sp
159 479 .in +2
160 480 .nf
161 -\fIvar \fRin \fIarray\fR
481 +var[expr1, expr2, ... exprn]
482 +var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]
162 483 .fi
163 484 .in -2
164 485
165 486 .sp
166 487 .LP
167 -or a Boolean combination of these.
488 +A multi-dimensioned \fIindex\fR used with the \fBin\fR operator must be put in
489 +parentheses. The \fBin\fR operator, which tests for the existence of a
490 +particular array element, does not create the element if it does not exist.
491 +Any other reference to a non-existent array element automatically creates it.
492 +
493 +.SS "Variables and Special Variables"
494 +Variables can be used in an \fBawk\fR program by referencing them. With the
495 +exception of function parameters, they are not explicitly declared.
496 +Uninitialized scalar variables and array elements have both a numeric value of
497 +zero and a string value of the empty string.
168 498 .sp
169 499 .LP
170 -Regular expressions are as in \fBegrep\fR(1). In patterns they must be
171 -surrounded by slashes. Isolated regular expressions in a pattern apply to the
172 -entire line. Regular expressions can also occur in relational expressions. A
173 -pattern can consist of two patterns separated by a comma; in this case, the
174 -action is performed for all lines between the occurrence of the first pattern
175 -to the occurrence of the second pattern.
500 +Field variables are designated by a \fB$\fR followed by a number or numerical
501 +expression. The effect of the field number \fIexpression\fR evaluating to
502 +anything other than a non-negative integer is unspecified. Uninitialized
503 +variables or string values need not be converted to numeric values in this
504 +context. New field variables are created by assigning a value to them.
505 +References to non-existent fields (that is, fields after \fB$NF\fR) produce the
506 +null string. However, assigning to a non-existent field (for example,
507 +\fB$(NF+2) = 5\fR) increases the value of \fBNF\fR, create any intervening
508 +fields with the null string as their values and cause the value of \fB$0\fR to
509 +be recomputed, with the fields being separated by the value of \fBOFS\fR. Each
510 +field variable has a string value when created. If the string, with any
511 +occurrence of the decimal-point character from the current locale changed to a
512 +period character, is considered a \fInumeric string\fR (see \fBExpressions in
513 +awk\fR above), the field variable also has the numeric value of the \fInumeric
514 +string\fR.
515 +
516 +.SS "/usr/bin/awk, /usr/xpg4/bin/awk"
517 +\fBawk\fR sets the following special variables that are supported by both
518 +\fB/usr/bin/awk\fR and \fB/usr/xpg4/bin/awk\fR:
176 519 .sp
177 -.LP
178 -The special patterns \fBBEGIN\fR and \fBEND\fR can be used to capture control
179 -before the first input line has been read and after the last input line has
180 -been read respectively. These keywords do not combine with any other patterns.
181 -.SS "Built-in Variables"
520 +.ne 2
521 +.na
522 +\fB\fBARGC\fR\fR
523 +.ad
524 +.RS 12n
525 +The number of elements in the \fBARGV\fR array.
526 +.RE
527 +
182 528 .sp
183 -.LP
184 -Built-in variables include:
529 +.ne 2
530 +.na
531 +\fB\fBARGV\fR\fR
532 +.ad
533 +.RS 12n
534 +An array of command line arguments, excluding options and the \fIprogram\fR
535 +argument, numbered from zero to \fBARGC\fR\(mi1.
185 536 .sp
537 +The arguments in \fBARGV\fR can be modified or added to; \fBARGC\fR can be
538 +altered. As each input file ends, \fBawk\fR treats the next non-null element
539 +of \fBARGV\fR, up to the current value of \fBARGC\fR\(mi1, inclusive, as the
540 +name of the next input file. Setting an element of \fBARGV\fR to null means
541 +that it is not treated as an input file. The name \fB\(mi\fR indicates the
542 +standard input. If an argument matches the format of an \fIassignment\fR
543 +operand, this argument is treated as an assignment rather than a \fIfile\fR
544 +argument.
545 +.RE
546 +
547 +.sp
186 548 .ne 2
187 549 .na
188 -\fB\fBFILENAME\fR \fR
550 +\fB\fBCONVFMT\fR\fR
189 551 .ad
190 -.RS 13n
191 -name of the current input file
552 +.RS 12n
553 +The \fBprintf\fR format for converting numbers to strings (except for output
554 +statements, where \fBOFMT\fR is used). The default is \fB%.6g\fR.
192 555 .RE
193 556
194 557 .sp
195 558 .ne 2
196 559 .na
197 -\fB\fBFS\fR \fR
560 +\fB\fBENVIRON\fR\fR
198 561 .ad
199 -.RS 13n
200 -input field separator regular expression (default blank and tab)
562 +.RS 12n
563 +The variable \fBENVIRON\fR is an array representing the value of the
564 +environment. The indices of the array are strings consisting of the names of
565 +the environment variables, and the value of each array element is a string
566 +consisting of the value of that variable. If the value of an environment
567 +variable is considered a \fInumeric string\fR, the array element also has its
568 +numeric value.
569 +.sp
570 +In all cases where \fBawk\fR behavior is affected by environment variables
571 +(including the environment of any commands that \fBawk\fR executes via the
572 +\fBsystem\fR function or via pipeline redirections with the \fBprint\fR
573 +statement, the \fBprintf\fR statement, or the \fBgetline\fR function), the
574 +environment used is the environment at the time \fBawk\fR began executing.
201 575 .RE
202 576
203 577 .sp
204 578 .ne 2
205 579 .na
206 -\fB\fBNF\fR \fR
580 +\fB\fBFILENAME\fR\fR
207 581 .ad
208 -.RS 13n
209 -number of fields in the current record
582 +.RS 12n
583 +A pathname of the current input file. Inside a \fBBEGIN\fR action the value is
584 +undefined. Inside an \fBEND\fR action the value is the name of the last input
585 +file processed.
210 586 .RE
211 587
212 588 .sp
213 589 .ne 2
214 590 .na
215 -\fB\fBNR\fR \fR
591 +\fB\fBFNR\fR\fR
216 592 .ad
217 -.RS 13n
218 -ordinal number of the current record
593 +.RS 12n
594 +The ordinal number of the current record in the current file. Inside a
595 +\fBBEGIN\fR action the value is zero. Inside an \fBEND\fR action the value is
596 +the number of the last record processed in the last file processed.
219 597 .RE
220 598
221 599 .sp
222 600 .ne 2
223 601 .na
224 -\fB\fBOFMT\fR \fR
602 +\fB\fBFS\fR\fR
225 603 .ad
226 -.RS 13n
227 -output format for numbers (default \fB%.6g\fR)
604 +.RS 12n
605 +Input field separator regular expression; a space character by default.
228 606 .RE
229 607
230 608 .sp
231 609 .ne 2
232 610 .na
233 -\fB\fBOFS\fR \fR
611 +\fB\fBNF\fR\fR
234 612 .ad
235 -.RS 13n
236 -output field separator (default blank)
613 +.RS 12n
614 +The number of fields in the current record. Inside a \fBBEGIN\fR action, the
615 +use of \fBNF\fR is undefined unless a \fBgetline\fR function without a
616 +\fIvar\fR argument is executed previously. Inside an \fBEND\fR action, \fBNF\fR
617 +retains the value it had for the last record read, unless a subsequent,
618 +redirected, \fBgetline\fR function without a \fIvar\fR argument is performed
619 +prior to entering the \fBEND\fR action.
237 620 .RE
238 621
239 622 .sp
240 623 .ne 2
241 624 .na
242 -\fB\fBORS\fR \fR
625 +\fB\fBNR\fR\fR
243 626 .ad
244 -.RS 13n
245 -output record separator (default new-line)
627 +.RS 12n
628 +The ordinal number of the current record from the start of input. Inside a
629 +\fBBEGIN\fR action the value is zero. Inside an \fBEND\fR action the value is
630 +the number of the last record processed.
246 631 .RE
247 632
248 633 .sp
249 634 .ne 2
250 635 .na
251 -\fB\fBRS\fR \fR
636 +\fB\fBOFMT\fR\fR
252 637 .ad
253 -.RS 13n
254 -input record separator (default new-line)
638 +.RS 12n
639 +The \fBprintf\fR format for converting numbers to strings in output statements
640 +\fB"%.6g"\fR by default. The result of the conversion is unspecified if the
641 +value of \fBOFMT\fR is not a floating-point format specification.
255 642 .RE
256 643
257 644 .sp
645 +.ne 2
646 +.na
647 +\fB\fBOFS\fR\fR
648 +.ad
649 +.RS 12n
650 +The \fBprint\fR statement output field separator; a space character by default.
651 +.RE
652 +
653 +.sp
654 +.ne 2
655 +.na
656 +\fB\fBORS\fR\fR
657 +.ad
658 +.RS 12n
659 +The \fBprint\fR output record separator; a newline character by default.
660 +.RE
661 +
662 +.sp
663 +.ne 2
664 +.na
665 +\fB\fBRLENGTH\fR\fR
666 +.ad
667 +.RS 12n
668 +The length of the string matched by the \fBmatch\fR function.
669 +.RE
670 +
671 +.sp
672 +.ne 2
673 +.na
674 +\fB\fBRS\fR\fR
675 +.ad
676 +.RS 12n
677 +The first character of the string value of \fBRS\fR is the input record
678 +separator; a newline character by default. If \fBRS\fR contains more than one
679 +character, the results are unspecified. If \fBRS\fR is null, then records are
680 +separated by sequences of one or more blank lines. Leading or trailing blank
681 +lines do not produce empty records at the beginning or end of input, and the
682 +field separator is always newline, no matter what the value of \fBFS\fR.
683 +.RE
684 +
685 +.sp
686 +.ne 2
687 +.na
688 +\fB\fBRSTART\fR\fR
689 +.ad
690 +.RS 12n
691 +The starting position of the string matched by the \fBmatch\fR function,
692 +numbering from 1. This is always equivalent to the return value of the
693 +\fBmatch\fR function.
694 +.RE
695 +
696 +.sp
697 +.ne 2
698 +.na
699 +\fB\fBSUBSEP\fR\fR
700 +.ad
701 +.RS 12n
702 +The subscript separator string for multi-dimensional arrays. The default value
703 +is \fB\e034\fR\&.
704 +.RE
705 +
706 +.SS "/usr/bin/awk"
707 +The following variable is supported for \fB/usr/bin/awk\fR only:
708 +.sp
709 +.ne 2
710 +.na
711 +\fB\fBRT\fR\fR
712 +.ad
713 +.RS 12n
714 +The record terminator for the most recent record read. For most records this
715 +will be the same value as \fBRS\fR. At the end of a file with no trailing
716 +separator value, though, this will be set to the empty string (\fB""\fR).
717 +.RE
718 +
719 +.SS "Regular Expressions"
720 +The \fBawk\fR utility makes use of the extended regular expression notation
721 +(see \fBregex\fR(5)) except that it allows the use of C-language conventions to
722 +escape special characters within the EREs, namely \fB\e\e\fR, \fB\ea\fR,
723 +\fB\eb\fR, \fB\ef\fR, \fB\en\fR, \fB\er\fR, \fB\et\fR, \fB\ev\fR, and those
724 +specified in the following table. These escape sequences are recognized both
725 +inside and outside bracket expressions. Note that records need not be
726 +separated by newline characters and string constants can contain newline
727 +characters, so even the \fB\en\fR sequence is valid in \fBawk\fR EREs. Using
728 +a slash character within the regular expression requires escaping as shown in
729 +the table below:
730 +.sp
731 +
732 +.sp
733 +.TS
734 +l l l
735 +l l l .
736 +\fBEscape Sequence\fR \fBDescription\fR \fBMeaning\fR
737 +_
738 +\fB\e"\fR Backslash quotation-mark Quotation-mark character
739 +_
740 +\fB\e/\fR Backslash slash Slash character
741 +_
742 +\fB\e\fR\fIddd\fR T{
743 +A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If all of the digits are 0, (that is, representation of the NULL character), the behavior is undefined.
744 +T} T{
745 +The character encoded by the one-, two- or three-digit octal integer. Multi-byte characters require multiple, concatenated escape sequences, including the leading \e for each byte.
746 +T}
747 +_
748 +\fB\e\fR\fIc\fR T{
749 +A backslash character followed by any character not described in this table or special characters (\fB\e\e\fR, \fB\ea\fR, \fB\eb\fR, \fB\ef\fR, \fB\en\fR, \fB\er\fR, \fB\et\fR, \fB\ev\fR).
750 +T} Undefined
751 +.TE
752 +
753 +.sp
258 754 .LP
755 +A regular expression can be matched against a specific field or string by using
756 +one of the two regular expression matching operators, \fB~\fR and \fB!\|~\fR.
757 +These operators interpret their right-hand operand as a regular expression and
758 +their left-hand operand as a string. If the regular expression matches the
759 +string, the \fB~\fR expression evaluates to the value \fB1\fR, and the
760 +\fB!\|~\fR expression evaluates to the value \fB0\fR. If the regular expression
761 +does not match the string, the \fB~\fR expression evaluates to the value
762 +\fB0\fR, and the \fB!\|~\fR expression evaluates to the value \fB1\fR. If the
763 +right-hand operand is any expression other than the lexical token \fBERE\fR,
764 +the string value of the expression is interpreted as an extended regular
765 +expression, including the escape conventions described above. Notice that these
766 +same escape conventions also are applied in the determining the value of a
767 +string literal (the lexical token \fBSTRING\fR), and is applied a second time
768 +when a string literal is used in this context.
769 +.sp
770 +.LP
771 +When an \fBERE\fR token appears as an expression in any context other than as
772 +the right-hand of the \fB~\fR or \fB!\|~\fR operator or as one of the built-in
773 +function arguments described below, the value of the resulting expression is
774 +the equivalent of:
775 +.sp
776 +.in +2
777 +.nf
778 +$0 ~ /\fIere\fR/
779 +.fi
780 +.in -2
781 +
782 +.sp
783 +.LP
784 +The \fIere\fR argument to the \fBgsub,\fR \fBmatch,\fR \fBsub\fR functions, and
785 +the \fIfs\fR argument to the \fBsplit\fR function (see \fBString Functions\fR)
786 +is interpreted as extended regular expressions. These can be either \fBERE\fR
787 +tokens or arbitrary expressions, and are interpreted in the same manner as the
788 +right-hand side of the \fB~\fR or \fB!\|~\fR operator.
789 +.sp
790 +.LP
791 +An extended regular expression can be used to separate fields by using the
792 +\fB-F\fR \fIERE\fR option or by assigning a string containing the expression to
793 +the built-in variable \fBFS\fR. The default value of the \fBFS\fR variable is a
794 +single space character. The following describes \fBFS\fR behavior:
795 +.RS +4
796 +.TP
797 +1.
798 +If \fBFS\fR is a single character:
799 +.RS +4
800 +.TP
801 +.ie t \(bu
802 +.el o
803 +If \fBFS\fR is the space character, skip leading and trailing blank characters;
804 +fields are delimited by sets of one or more blank characters.
805 +.RE
806 +.RS +4
807 +.TP
808 +.ie t \(bu
809 +.el o
810 +Otherwise, if \fBFS\fR is any other character \fIc\fR, fields are delimited by
811 +each single occurrence of \fIc\fR.
812 +.RE
813 +.RE
814 +.RS +4
815 +.TP
816 +2.
817 +Otherwise, the string value of \fBFS\fR is considered to be an extended
818 +regular expression. Each occurrence of a sequence matching the extended regular
819 +expression delimits fields.
820 +.RE
821 +.sp
822 +.LP
823 +Except in the \fBgsub\fR, \fBmatch\fR, \fBsplit\fR, and \fBsub\fR built-in
824 +functions, regular expression matching is based on input records. That is,
825 +record separator characters (the first character of the value of the variable
826 +\fBRS\fR, a newline character by default) cannot be embedded in the expression,
827 +and no expression matches the record separator character. If the record
828 +separator is not a newline character, newline characters embedded in the
829 +expression can be matched. In those four built-in functions, regular expression
830 +matching are based on text strings. So, any character (including the newline
831 +character and the record separator) can be embedded in the pattern and an
832 +appropriate pattern matches any character. However, in all \fBawk\fR regular
833 +expression matching, the use of one or more NULL characters in the pattern,
834 +input record or text string produces undefined results.
835 +
836 +.SS "Patterns"
837 +A \fIpattern\fR is any valid \fIexpression,\fR a range specified by two
838 +expressions separated by comma, or one of the two special patterns \fBBEGIN\fR
839 +or \fBEND\fR.
840 +
841 +.SS "Special Patterns"
842 +The \fBawk\fR utility recognizes two special patterns, \fBBEGIN\fR and
843 +\fBEND\fR. Each \fBBEGIN\fR pattern is matched once and its associated action
844 +executed before the first record of input is read (except possibly by use of
845 +the \fBgetline\fR function in a prior \fBBEGIN\fR action) and before command
846 +line assignment is done. Each \fBEND\fR pattern is matched once and its
847 +associated action executed after the last record of input has been read. These
848 +two patterns have associated actions.
849 +.sp
850 +.LP
851 +\fBBEGIN\fR and \fBEND\fR do not combine with other patterns. Multiple
852 +\fBBEGIN\fR and \fBEND\fR patterns are allowed. The actions associated with the
853 +\fBBEGIN\fR patterns are executed in the order specified in the program, as are
854 +the \fBEND\fR actions. An \fBEND\fR pattern can precede a \fBBEGIN\fR pattern
855 +in a program.
856 +.sp
857 +.LP
858 +If an \fBawk\fR program consists of only actions with the pattern \fBBEGIN\fR,
859 +and the \fBBEGIN\fR action contains no \fBgetline\fR function, \fBawk\fR exits
860 +without reading its input when the last statement in the last \fBBEGIN\fR
861 +action is executed. If an \fBawk\fR program consists of only actions with the
862 +pattern \fBEND\fR or only actions with the patterns \fBBEGIN\fR and \fBEND\fR,
863 +the input is read before the statements in the \fBEND\fR actions are executed.
864 +
865 +.SS "Expression Patterns"
866 +An expression pattern is evaluated as if it were an expression in a Boolean
867 +context. If the result is true, the pattern is considered to match, and the
868 +associated action (if any) is executed. If the result is false, the action is
869 +not executed.
870 +
871 +.SS "Pattern Ranges"
872 +A pattern range consists of two expressions separated by a comma. In this case,
873 +the action is performed for all records between a match of the first expression
874 +and the following match of the second expression, inclusive. At this point, the
875 +pattern range can be repeated starting at input records subsequent to the end
876 +of the matched range.
877 +
878 +.SS "Actions"
259 879 An action is a sequence of statements. A statement can be one of the following:
260 880 .sp
261 881 .in +2
262 882 .nf
263 883 if ( \fIexpression\fR ) \fIstatement\fR [ else \fIstatement\fR ]
264 884 while ( \fIexpression\fR ) \fIstatement\fR
265 885 do \fIstatement\fR while ( \fIexpression\fR )
266 886 for ( \fIexpression\fR ; \fIexpression\fR ; \fIexpression\fR ) \fIstatement\fR
267 887 for ( \fIvar\fR in \fIarray\fR ) \fIstatement\fR
888 +delete \fIarray\fR[\fIsubscript\fR] #delete an array element
889 +delete \fIarray\fR #delete all elements within an array
268 890 break
269 891 continue
270 892 { [ \fIstatement\fR ] .\|.\|. }
271 -\fIexpression\fR # commonly variable = expression
893 +\fIexpression\fR # commonly variable = expression
272 894 print [ \fIexpression-list\fR ] [ >\fIexpression\fR ]
273 895 printf format [ ,\fIexpression-list\fR ] [ >\fIexpression\fR ]
274 -next # skip remaining patterns on this input line
275 -exit [expr] # skip the rest of the input; exit status is expr
896 +next # skip remaining patterns on this input line
897 +nextfile # skip remaining patterns on this input file
898 +exit [expr] # skip the rest of the input; exit status is expr
899 +return [expr]
276 900 .fi
277 901 .in -2
278 902
279 903 .sp
280 904 .LP
281 -Statements are terminated by semicolons, newlines, or right braces. An empty
282 -expression-list stands for the whole input line. Expressions take on string or
283 -numeric values as appropriate, and are built using the operators \fB+\fR,
284 -\fB\(mi\fR, \fB*\fR, \fB/\fR, \fB%\fR, \fB^\fR and concatenation (indicated by
285 -a blank). The operators \fB++\fR, \fB\(mi\(mi\fR, \fB+=\fR, \fB\(mi=\fR,
286 -\fB*=\fR, \fB/=\fR, \fB%=\fR, \fB^=\fR, \fB>\fR, \fB>=\fR, \fB<\fR, \fB<=\fR,
287 -\fB==\fR, \fB!=\fR, and \fB?:\fR are also available in expressions. Variables
288 -can be scalars, array elements (denoted x[i]), or fields. Variables are
289 -initialized to the null string or zero. Array subscripts can be any string, not
290 -necessarily numeric; this allows for a form of associative memory. String
291 -constants are quoted (\fB""\fR), with the usual C escapes recognized within.
905 +Any single statement can be replaced by a statement list enclosed in braces.
906 +The statements are terminated by newline characters or semicolons, and are
907 +executed sequentially in the order that they appear.
292 908 .sp
293 909 .LP
294 -The \fBprint\fR statement prints its arguments on the standard output, or on a
295 -file if \fB>\fR\fIexpression\fR is present, or on a pipe if '\fB|\fR\fIcmd\fR'
296 -is present. The output resulted from the print statement is terminated by the
297 -output record separator with each argument separated by the current output
298 -field separator. The \fBprintf\fR statement formats its expression list
299 -according to the format (see \fBprintf\fR(3C)).
300 -.SS "Built-in Functions"
910 +The \fBnext\fR statement causes all further processing of the current input
911 +record to be abandoned. The behavior is undefined if a \fBnext\fR statement
912 +appears or is invoked in a \fBBEGIN\fR or \fBEND\fR action.
301 913 .sp
302 914 .LP
303 -The arithmetic functions are as follows:
915 +The \fBnextfile\fR statement is similar to \fBnext\fR, but also skips all other
916 +records in the current file, and moves on to processing the next input file if
917 +available (or exits the program if there are none). (Note that this keyword is
918 +not supported by \fB/usr/xpg4/bin/awk\fR.)
304 919 .sp
920 +.LP
921 +The \fBexit\fR statement invokes all \fBEND\fR actions in the order in which
922 +they occur in the program source and then terminate the program without reading
923 +further input. An \fBexit\fR statement inside an \fBEND\fR action terminates
924 +the program without further execution of \fBEND\fR actions. If an expression
925 +is specified in an \fBexit\fR statement, its numeric value is the exit status
926 +of \fBawk\fR, unless subsequent errors are encountered or a subsequent
927 +\fBexit\fR statement with an expression is executed.
928 +
929 +.SS "Output Statements"
930 +Both \fBprint\fR and \fBprintf\fR statements write to standard output by
931 +default. The output is written to the location specified by
932 +\fIoutput_redirection\fR if one is supplied, as follows:
933 +.sp
934 +.in +2
935 +.nf
936 +\fB>\fR \fIexpression\fR\fB>>\fR \fIexpression\fR\fB|\fR \fIexpression\fR
937 +.fi
938 +.in -2
939 +
940 +.sp
941 +.LP
942 +In all cases, the \fIexpression\fR is evaluated to produce a string that is
943 +used as a full pathname to write into (for \fB>\fR or \fB>>\fR) or as a command
944 +to be executed (for \fB|\fR). Using the first two forms, if the file of that
945 +name is not currently open, it is opened, creating it if necessary and using
946 +the first form, truncating the file. The output then is appended to the file.
947 +As long as the file remains open, subsequent calls in which \fIexpression\fR
948 +evaluates to the same string value simply appends output to the file. The file
949 +remains open until the \fBclose\fR function, which is called with an expression
950 +that evaluates to the same string value.
951 +.sp
952 +.LP
953 +The third form writes output onto a stream piped to the input of a command. The
954 +stream is created if no stream is currently open with the value of
955 +\fIexpression\fR as its command name. The stream created is equivalent to one
956 +created by a call to the \fBpopen\fR(3C) function with the value of
957 +\fIexpression\fR as the \fIcommand\fR argument and a value of \fBw\fR as the
958 +\fImode\fR argument. As long as the stream remains open, subsequent calls in
959 +which \fIexpression\fR evaluates to the same string value writes output to the
960 +existing stream. The stream remains open until the \fBclose\fR function is
961 +called with an expression that evaluates to the same string value. At that
962 +time, the stream is closed as if by a call to the \fBpclose\fR function.
963 +.sp
964 +.LP
965 +These output statements take a comma-separated list of \fIexpression\fR \fIs\fR
966 +referred in the grammar by the non-terminal symbols \fBexpr_list,\fR
967 +\fBprint_expr_list\fR or \fBprint_expr_list_opt.\fR This list is referred to
968 +here as the \fIexpression list\fR, and each member is referred to as an
969 +\fIexpression argument\fR.
970 +.sp
971 +.LP
972 +The \fBprint\fR statement writes the value of each expression argument onto the
973 +indicated output stream separated by the current output field separator (see
974 +variable \fBOFS\fR above), and terminated by the output record separator (see
975 +variable \fBORS\fR above). All expression arguments is taken as strings, being
976 +converted if necessary; with the exception that the \fBprintf\fR format in
977 +\fBOFMT\fR is used instead of the value in \fBCONVFMT\fR. An empty expression
978 +list stands for the whole input record \fB(\fR$0\fB)\fR.
979 +.sp
980 +.LP
981 +The \fBprintf\fR statement produces output based on a notation similar to the
982 +File Format Notation used to describe file formats in this document Output is
983 +produced as specified with the first expression argument as the string
984 +\fBformat\fR and subsequent expression arguments as the strings \fBarg1\fR to
985 +\fBargn,\fR inclusive, with the following exceptions:
986 +.RS +4
987 +.TP
988 +1.
989 +The \fIformat\fR is an actual character string rather than a graphical
990 +representation. Therefore, it cannot contain empty character positions. The
991 +space character in the \fIformat\fR string, in any context other than a
992 +\fIflag\fR of a conversion specification, is treated as an ordinary character
993 +that is copied to the output.
994 +.RE
995 +.RS +4
996 +.TP
997 +2.
998 +If the character set contains a Delta character and that character appears
999 +in the \fIformat\fR string, it is treated as an ordinary character that is
1000 +copied to the output.
1001 +.RE
1002 +.RS +4
1003 +.TP
1004 +3.
1005 +The \fIescape sequences\fR beginning with a backslash character is treated
1006 +as sequences of ordinary characters that are copied to the output. Note that
1007 +these same sequences is interpreted lexically by \fBawk\fR when they appear in
1008 +literal strings, but they is not treated specially by the \fBprintf\fR
1009 +statement.
1010 +.RE
1011 +.RS +4
1012 +.TP
1013 +4.
1014 +A \fIfield width\fR or \fIprecision\fR can be specified as the \fB*\fR
1015 +character instead of a digit string. In this case the next argument from the
1016 +expression list is fetched and its numeric value taken as the field width or
1017 +precision.
1018 +.RE
1019 +.RS +4
1020 +.TP
1021 +5.
1022 +The implementation does not precede or follow output from the \fBd\fR or
1023 +\fBu\fR conversion specifications with blank characters not specified by the
1024 +\fIformat\fR string.
1025 +.RE
1026 +.RS +4
1027 +.TP
1028 +6.
1029 +The implementation does not precede output from the \fBo\fR conversion
1030 +specification with leading zeros not specified by the \fIformat\fR string.
1031 +.RE
1032 +.RS +4
1033 +.TP
1034 +7.
1035 +For the \fBc\fR conversion specification: if the argument has a numeric
1036 +value, the character whose encoding is that value is output. If the value is
1037 +zero or is not the encoding of any character in the character set, the behavior
1038 +is undefined. If the argument does not have a numeric value, the first
1039 +character of the string value is output; if the string does not contain any
1040 +characters the behavior is undefined.
1041 +.RE
1042 +.RS +4
1043 +.TP
1044 +8.
1045 +For each conversion specification that consumes an argument, the next
1046 +expression argument is evaluated. With the exception of the \fBc\fR conversion,
1047 +the value is converted to the appropriate type for the conversion
1048 +specification.
1049 +.RE
1050 +.RS +4
1051 +.TP
1052 +9.
1053 +If there are insufficient expression arguments to satisfy all the conversion
1054 +specifications in the \fIformat\fR string, the behavior is undefined.
1055 +.RE
1056 +.RS +4
1057 +.TP
1058 +10.
1059 +If any character sequence in the \fIformat\fR string begins with a %
1060 +character, but does not form a valid conversion specification, the behavior is
1061 +unspecified.
1062 +.RE
1063 +.sp
1064 +.LP
1065 +Both \fBprint\fR and \fBprintf\fR can output at least \fB{LINE_MAX}\fR bytes.
1066 +
1067 +.SS "Functions"
1068 +The \fBawk\fR language has a variety of built-in functions: arithmetic,
1069 +string, input/output and general.
1070 +
1071 +.SS "Arithmetic Functions"
1072 +The arithmetic functions, except for \fBint\fR, are based on the \fBISO\fR
1073 +\fBC\fR standard. The behavior is undefined in cases where the \fBISO\fR
1074 +\fBC\fR standard specifies that an error be returned or that the behavior is
1075 +undefined. Although the grammar permits built-in functions to appear with no
1076 +arguments or parentheses, unless the argument or parentheses are indicated as
1077 +optional in the following list (by displaying them within the \fB[ ]\fR
1078 +brackets), such use is undefined.
1079 +.sp
305 1080 .ne 2
306 1081 .na
1082 +\fB\fBatan2(\fR\fIy\fR,\fIx\fR\fB)\fR\fR
1083 +.ad
1084 +.RS 17n
1085 +Return arctangent of \fIy\fR/\fIx\fR.
1086 +.RE
1087 +
1088 +.sp
1089 +.ne 2
1090 +.na
307 1091 \fB\fBcos\fR(\fIx\fR)\fR
308 1092 .ad
309 -.RS 11n
310 -Return cosine of \fIx\fR, where \fIx\fR is in radians. (In
311 -\fB/usr/xpg4/bin/awk\fR only. See \fBnawk\fR(1).)
1093 +.RS 17n
1094 +Return cosine of \fIx,\fR where \fIx\fR is in radians.
312 1095 .RE
313 1096
314 1097 .sp
315 1098 .ne 2
316 1099 .na
317 1100 \fB\fBsin\fR(\fIx\fR)\fR
318 1101 .ad
319 -.RS 11n
320 -Return sine of \fIx\fR, where \fIx\fR is in radians. (In
321 -\fB/usr/xpg4/bin/awk\fR only. See \fBnawk\fR(1).)
1102 +.RS 17n
1103 +Return sine of \fIx,\fR where \fIx\fR is in radians.
322 1104 .RE
323 1105
324 1106 .sp
325 1107 .ne 2
326 1108 .na
327 1109 \fB\fBexp\fR(\fIx\fR)\fR
328 1110 .ad
329 -.RS 11n
1111 +.RS 17n
330 1112 Return the exponential function of \fIx\fR.
331 1113 .RE
332 1114
333 1115 .sp
334 1116 .ne 2
335 1117 .na
336 1118 \fB\fBlog\fR(\fIx\fR)\fR
337 1119 .ad
338 -.RS 11n
1120 +.RS 17n
339 1121 Return the natural logarithm of \fIx\fR.
340 1122 .RE
341 1123
342 1124 .sp
343 1125 .ne 2
344 1126 .na
345 1127 \fB\fBsqrt\fR(\fIx\fR)\fR
346 1128 .ad
347 -.RS 11n
1129 +.RS 17n
348 1130 Return the square root of \fIx\fR.
349 1131 .RE
350 1132
351 1133 .sp
352 1134 .ne 2
353 1135 .na
354 1136 \fB\fBint\fR(\fIx\fR)\fR
355 1137 .ad
356 -.RS 11n
357 -Truncate its argument to an integer. It is truncated toward \fB0\fR when
358 -\fIx\fR >\fB 0\fR.
1138 +.RS 17n
1139 +Truncate its argument to an integer. It is truncated toward 0 when \fIx\fR > 0.
359 1140 .RE
360 1141
361 1142 .sp
362 -.LP
363 -The string functions are as follows:
1143 +.ne 2
1144 +.na
1145 +\fB\fBrand()\fR\fR
1146 +.ad
1147 +.RS 17n
1148 +Return a random number \fIn\fR, such that 0 \(<= \fIn\fR < 1.
1149 +.RE
1150 +
364 1151 .sp
365 1152 .ne 2
366 1153 .na
367 -\fB\fBindex(\fR\fIs\fR\fB, \fR\fIt\fR\fB)\fR\fR
1154 +\fB\fBsrand\fR([\fBexpr\fR])\fR
368 1155 .ad
1156 +.RS 17n
1157 +Set the seed value for \fBrand\fR to \fIexpr\fR or use the time of day if
1158 +\fIexpr\fR is omitted. The previous seed value is returned.
1159 +.RE
1160 +
1161 +.SS "String Functions"
1162 +The string functions in the following list shall be supported. Although the
1163 +grammar permits built-in functions to appear with no arguments or parentheses,
1164 +unless the argument or parentheses are indicated as optional in the following
1165 +list (by displaying them within the \fB[ ]\fR brackets), such use is undefined.
1166 +.sp
1167 +.ne 2
1168 +.na
1169 +\fB\fBgsub\fR(\fIere\fR,\fIrepl\fR[,\|\fIin\fR])\fR
1170 +.ad
369 1171 .sp .6
370 1172 .RS 4n
371 -Return the position in string \fIs\fR where string \fIt\fR first occurs, or
372 -\fB0\fR if it does not occur at all.
1173 +Behave like \fBsub\fR (see below), except that it replaces all occurrences of
1174 +the regular expression (like the \fBed\fR utility global substitute) in
1175 +\fB$0\fR or in the \fIin\fR argument, when specified.
373 1176 .RE
374 1177
375 1178 .sp
376 1179 .ne 2
377 1180 .na
378 -\fB\fBint(\fR\fIs\fR\fB)\fR\fR
1181 +\fB\fBindex\fR(\fIs\fR,\fIt\fR)\fR
379 1182 .ad
380 1183 .sp .6
381 1184 .RS 4n
382 -truncates \fIs\fR to an integer value. If \fIs\fR is not specified, $0 is used.
1185 +Return the position, in characters, numbering from 1, in string \fIs\fR where
1186 +string \fIt\fR first occurs, or zero if it does not occur at all.
383 1187 .RE
384 1188
385 1189 .sp
386 1190 .ne 2
387 1191 .na
388 -\fB\fBlength(\fR\fIs\fR\fB)\fR\fR
1192 +\fB\fBlength\fR[([\fIv\fR])]\fR
389 1193 .ad
390 1194 .sp .6
391 1195 .RS 4n
392 -Return the length of its argument taken as a string, or of the whole line if
393 -there is no argument.
1196 +Given no argument, this function returns the length of the whole record,
1197 +\fB$0\fR. If given an array as an argument (and using \fB/usr/bin/awk\fR),
1198 +then this returns the number of elements it contains. Otherwise, this function
1199 +interprets the argument as a string (performing any needed conversions) and
1200 +returns its length in characters.
394 1201 .RE
395 1202
396 1203 .sp
397 1204 .ne 2
398 1205 .na
399 -\fB\fBsplit(\fR\fIs\fR, \fIa\fR, \fIfs\fR\fB)\fR\fR
1206 +\fB\fBmatch\fR(\fIs\fR,\fIere\fR)\fR
400 1207 .ad
401 1208 .sp .6
402 1209 .RS 4n
403 -Split the string \fIs\fR into array elements \fIa\fR[\fI1\fR],
404 -\fIa\fR[\fI2\fR], \|.\|.\|. \fIa\fR[\fIn\fR], and returns \fIn\fR. The
405 -separation is done with the regular expression \fIfs\fR or with the field
406 -separator \fBFS\fR if \fIfs\fR is not given.
1210 +Return the position, in characters, numbering from 1, in string \fIs\fR where
1211 +the extended regular expression \fIere\fR occurs, or zero if it does not occur
1212 +at all. \fBRSTART\fR is set to the starting position (which is the same as the
1213 +returned value), zero if no match is found; \fBRLENGTH\fR is set to the length
1214 +of the matched string, \(mi1 if no match is found.
407 1215 .RE
408 1216
409 1217 .sp
410 1218 .ne 2
411 1219 .na
412 -\fB\fBsprintf(\fR\fIfmt\fR, \fIexpr\fR, \fIexpr\fR,\|.\|.\|.\|\fB)\fR\fR
1220 +\fB\fBsplit\fR(\fIs\fR,\fIa\fR[,\|\fIfs\fR])\fR
413 1221 .ad
414 1222 .sp .6
415 1223 .RS 4n
416 -Format the expressions according to the \fBprintf\fR(3C) format given by
417 -\fIfmt\fR and returns the resulting string.
1224 +Split the string \fIs\fR into array elements \fIa\fR[1], \fIa\fR[2],
1225 +\fB\&...,\fR \fIa\fR[\fIn\fR], and return \fIn\fR. The separation is done with
1226 +the extended regular expression \fIfs\fR or with the field separator \fBFS\fR
1227 +if \fIfs\fR is not given. Each array element has a string value when created.
1228 +If the string assigned to any array element, with any occurrence of the
1229 +decimal-point character from the current locale changed to a period character,
1230 +would be considered a \fInumeric string\fR; the array element also has the
1231 +numeric value of the \fInumeric string\fR. The effect of a null string as the
1232 +value of \fIfs\fR is unspecified.
418 1233 .RE
419 1234
420 1235 .sp
421 1236 .ne 2
422 1237 .na
423 -\fB\fBsubstr(\fR\fIs\fR, \fIm\fR, \fIn\fR\fB)\fR\fR
1238 +\fB\fBsprintf\fR(\fBfmt\fR,\fIexpr\fR,\fIexpr\fR,\fB\&...\fR)\fR
424 1239 .ad
425 1240 .sp .6
426 1241 .RS 4n
427 -returns the \fIn\fR-character substring of \fIs\fR that begins at position
428 -\fIm\fR.
1242 +Format the expressions according to the \fBprintf\fR format given by \fIfmt\fR
1243 +and return the resulting string.
429 1244 .RE
430 1245
431 1246 .sp
1247 +.ne 2
1248 +.na
1249 +\fB\fBsub\fR(\fIere\fR,\fIrepl\fR[,\|\fIin\fR])\fR
1250 +.ad
1251 +.sp .6
1252 +.RS 4n
1253 +Substitute the string \fIrepl\fR in place of the first instance of the extended
1254 +regular expression \fBERE\fR in string in and return the number of
1255 +substitutions. An ampersand ( \fB&\fR ) appearing in the string \fIrepl\fR is
1256 +replaced by the string from in that matches the regular expression. An
1257 +ampersand preceded with a backslash ( \fB\e\fR ) is interpreted as the literal
1258 +ampersand character. An occurrence of two consecutive backslashes is
1259 +interpreted as just a single literal backslash character. Any other occurrence
1260 +of a backslash (for example, preceding any other character) is treated as a
1261 +literal backslash character. If \fIrepl\fR is a string literal, the handling of
1262 +the ampersand character occurs after any lexical processing, including any
1263 +lexical backslash escape sequence processing. If \fBin\fR is specified and it
1264 +is not an \fBlvalue\fR the behavior is undefined. If in is omitted, \fBawk\fR
1265 +uses the current record (\fB$0\fR) in its place.
1266 +.RE
1267 +
1268 +.sp
1269 +.ne 2
1270 +.na
1271 +\fB\fBsubstr\fR(\fIs\fR,\fIm\fR[,\|\fIn\fR])\fR
1272 +.ad
1273 +.sp .6
1274 +.RS 4n
1275 +Return the at most \fIn\fR-character substring of \fIs\fR that begins at
1276 +position \fIm,\fR numbering from 1. If \fIn\fR is missing, the length of the
1277 +substring is limited by the length of the string \fIs\fR.
1278 +.RE
1279 +
1280 +.sp
1281 +.ne 2
1282 +.na
1283 +\fB\fBtolower\fR(\fIs\fR)\fR
1284 +.ad
1285 +.sp .6
1286 +.RS 4n
1287 +Return a string based on the string \fIs\fR. Each character in \fIs\fR that is
1288 +an upper-case letter specified to have a \fBtolower\fR mapping by the
1289 +\fBLC_CTYPE\fR category of the current locale is replaced in the returned
1290 +string by the lower-case letter specified by the mapping. Other characters in
1291 +\fIs\fR are unchanged in the returned string.
1292 +.RE
1293 +
1294 +.sp
1295 +.ne 2
1296 +.na
1297 +\fB\fBtoupper\fR(\fIs\fR)\fR
1298 +.ad
1299 +.sp .6
1300 +.RS 4n
1301 +Return a string based on the string \fIs\fR. Each character in \fIs\fR that is
1302 +a lower-case letter specified to have a \fBtoupper\fR mapping by the
1303 +\fBLC_CTYPE\fR category of the current locale is replaced in the returned
1304 +string by the upper-case letter specified by the mapping. Other characters in
1305 +\fIs\fR are unchanged in the returned string.
1306 +.RE
1307 +
1308 +.sp
432 1309 .LP
433 -The input/output function is as follows:
1310 +All of the preceding functions that take \fIERE\fR as a parameter expect a
1311 +pattern or a string valued expression that is a regular expression as defined
1312 +below.
1313 +
1314 +.SS "Input/Output and General Functions"
1315 +The input/output and general functions are:
434 1316 .sp
435 1317 .ne 2
436 1318 .na
1319 +\fB\fBclose(\fR\fIexpression\fR)\fR
1320 +.ad
1321 +.RS 27n
1322 +Close the file or pipe opened by a \fBprint\fR or \fBprintf\fR statement or a
1323 +call to \fBgetline\fR with the same string-valued \fIexpression\fR. If the
1324 +close was successful, the function returns \fB0\fR; otherwise, it returns
1325 +non-zero.
1326 +.RE
1327 +
1328 +.sp
1329 +.ne 2
1330 +.na
1331 +\fB\fBfflush(\fR\fIexpression\fR)\fR
1332 +.ad
1333 +.RS 27n
1334 +Flush any buffered output for the file or pipe opened by a \fBprint\fR or
1335 +\fBprintf\fR statement or a call to \fBgetline\fR with the same string-valued
1336 +\fIexpression\fR. If the flush was successful, the function returns \fB0\fR;
1337 +otherwise, it returns \fBEOF\fR. If no arguments or the empty string
1338 +(\fB""\fR) are given, then all open files will be flushed. (Note that
1339 +\fBfflush\fR is supported in \fB/usr/bin/awk\fR only.)
1340 +.RE
1341 +
1342 +.sp
1343 +.ne 2
1344 +.na
1345 +\fB\fIexpression\fR|\fBgetline\fR[\fIvar\fR]\fR
1346 +.ad
1347 +.RS 27n
1348 +Read a record of input from a stream piped from the output of a command. The
1349 +stream is created if no stream is currently open with the value of
1350 +\fIexpression\fR as its command name. The stream created is equivalent to one
1351 +created by a call to the \fBpopen\fR function with the value of
1352 +\fIexpression\fR as the \fIcommand\fR argument and a value of \fBr\fR as the
1353 +\fImode\fR argument. As long as the stream remains open, subsequent calls in
1354 +which \fIexpression\fR evaluates to the same string value reads subsequent
1355 +records from the file. The stream remains open until the \fBclose\fR function
1356 +is called with an expression that evaluates to the same string value. At that
1357 +time, the stream is closed as if by a call to the \fBpclose\fR function. If
1358 +\fIvar\fR is missing, \fB$0\fR and \fBNF\fR is set. Otherwise, \fIvar\fR is
1359 +set.
1360 +.sp
1361 +The \fBgetline\fR operator can form ambiguous constructs when there are
1362 +operators that are not in parentheses (including concatenate) to the left of
1363 +the \fB|\fR (to the beginning of the expression containing \fBgetline\fR). In
1364 +the context of the \fB$\fR operator, \fB|\fR behaves as if it had a lower
1365 +precedence than \fB$\fR. The result of evaluating other operators is
1366 +unspecified, and all such uses of portable applications must be put in
1367 +parentheses properly.
1368 +.RE
1369 +
1370 +.sp
1371 +.ne 2
1372 +.na
437 1373 \fB\fBgetline\fR\fR
438 1374 .ad
439 -.RS 11n
440 -Set \fB$0\fR to the next input record from the current input file.
441 -\fBgetline\fR returns \fB1\fR for successful input, \fB0\fR for end of file,
442 -and \fB\(mi1\fR for an error.
1375 +.RS 27n
1376 +Set \fB$0\fR to the next input record from the current input file. This form of
1377 +\fBgetline\fR sets the \fBNF\fR, \fBNR\fR, and \fBFNR\fR variables.
443 1378 .RE
444 1379
445 -.SS "Large File Behavior"
446 1380 .sp
1381 +.ne 2
1382 +.na
1383 +\fB\fBgetline\fR \fIvar\fR\fR
1384 +.ad
1385 +.RS 27n
1386 +Set variable \fIvar\fR to the next input record from the current input file.
1387 +This form of \fBgetline\fR sets the \fBFNR\fR and \fBNR\fR variables.
1388 +.RE
1389 +
1390 +.sp
1391 +.ne 2
1392 +.na
1393 +\fB\fBgetline\fR [\fIvar\fR] \fB<\fR \fIexpression\fR\fR
1394 +.ad
1395 +.RS 27n
1396 +Read the next record of input from a named file. The \fIexpression\fR is
1397 +evaluated to produce a string that is used as a full pathname. If the file of
1398 +that name is not currently open, it is opened. As long as the stream remains
1399 +open, subsequent calls in which \fIexpression\fR evaluates to the same string
1400 +value reads subsequent records from the file. The file remains open until the
1401 +\fBclose\fR function is called with an expression that evaluates to the same
1402 +string value. If \fIvar\fR is missing, \fB$0\fR and \fBNF\fR is set. Otherwise,
1403 +\fIvar\fR is set.
1404 +.sp
1405 +The \fBgetline\fR operator can form ambiguous constructs when there are binary
1406 +operators that are not in parentheses (including concatenate) to the right of
1407 +the \fB<\fR (up to the end of the expression containing the \fBgetline\fR). The
1408 +result of evaluating such a construct is unspecified, and all such uses of
1409 +portable applications must be put in parentheses properly.
1410 +.RE
1411 +
1412 +.sp
1413 +.ne 2
1414 +.na
1415 +\fB\fBsystem\fR(\fIexpression\fR)\fR
1416 +.ad
1417 +.RS 27n
1418 +Execute the command given by \fIexpression\fR in a manner equivalent to the
1419 +\fBsystem\fR(3C) function and return the exit status of the command.
1420 +.RE
1421 +
1422 +.sp
447 1423 .LP
1424 +All forms of \fBgetline\fR return \fB1\fR for successful input, \fB0\fR for end
1425 +of file, and \fB\(mi1\fR for an error.
1426 +.sp
1427 +.LP
1428 +Where strings are used as the name of a file or pipeline, the strings must be
1429 +textually identical. The terminology ``same string value'' implies that
1430 +``equivalent strings'', even those that differ only by space characters,
1431 +represent different files.
1432 +
1433 +.SS "User-defined Functions"
1434 +The \fBawk\fR language also provides user-defined functions. Such functions
1435 +can be defined as:
1436 +.sp
1437 +.in +2
1438 +.nf
1439 +\fBfunction\fR \fIname\fR(\fIargs\fR,\|.\|.\|.) { \fIstatements\fR }
1440 +.fi
1441 +.in -2
1442 +
1443 +.sp
1444 +.LP
1445 +A function can be referred to anywhere in an \fBawk\fR program; in particular,
1446 +its use can precede its definition. The scope of a function is global.
1447 +.sp
1448 +.LP
1449 +Function arguments can be either scalars or arrays; the behavior is undefined
1450 +if an array name is passed as an argument that the function uses as a scalar,
1451 +or if a scalar expression is passed as an argument that the function uses as an
1452 +array. Function arguments are passed by value if scalar and by reference if
1453 +array name. Argument names are local to the function; all other variable names
1454 +are global. The same name is not used as both an argument name and as the name
1455 +of a function or a special \fBawk\fR variable. The same name must not be used
1456 +both as a variable name with global scope and as the name of a function. The
1457 +same name must not be used within the same scope both as a scalar variable and
1458 +as an array.
1459 +.sp
1460 +.LP
1461 +The number of parameters in the function definition need not match the number
1462 +of parameters in the function call. Excess formal parameters can be used as
1463 +local variables. If fewer arguments are supplied in a function call than are in
1464 +the function definition, the extra parameters that are used in the function
1465 +body as scalars are initialized with a string value of the null string and a
1466 +numeric value of zero, and the extra parameters that are used in the function
1467 +body as arrays are initialized as empty arrays. If more arguments are supplied
1468 +in a function call than are in the function definition, the behavior is
1469 +undefined.
1470 +.sp
1471 +.LP
1472 +When invoking a function, no white space can be placed between the function
1473 +name and the opening parenthesis. Function calls can be nested and recursive
1474 +calls can be made upon functions. Upon return from any nested or recursive
1475 +function call, the values of all of the calling function's parameters are
1476 +unchanged, except for array parameters passed by reference. The \fBreturn\fR
1477 +statement can be used to return a value. If a \fBreturn\fR statement appears
1478 +outside of a function definition, the behavior is undefined.
1479 +.sp
1480 +.LP
1481 +In the function definition, newline characters are optional before the opening
1482 +brace and after the closing brace. Function definitions can appear anywhere in
1483 +the program where a \fIpattern-action\fR pair is allowed.
1484 +
1485 +.SH USAGE
1486 +The \fBindex\fR, \fBlength\fR, \fBmatch\fR, and \fBsubstr\fR functions should
1487 +not be confused with similar functions in the \fBISO C\fR standard; the
1488 +\fBawk\fR versions deal with characters, while the \fBISO C\fR standard deals
1489 +with bytes.
1490 +.sp
1491 +.LP
1492 +Because the concatenation operation is represented by adjacent expressions
1493 +rather than an explicit operator, it is often necessary to use parentheses to
1494 +enforce the proper evaluation precedence.
1495 +.sp
1496 +.LP
448 1497 See \fBlargefile\fR(5) for the description of the behavior of \fBawk\fR when
449 -encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
1498 +encountering files greater than or equal to 2 Gbyte (2^31 bytes).
1499 +
450 1500 .SH EXAMPLES
1501 +The \fBawk\fR program specified in the command line is most easily specified
1502 +within single-quotes (for example, \fB\&'\fR\fIprogram\fR\fB\&'\fR) for
1503 +applications using \fBsh\fR, because \fBawk\fR programs commonly contain
1504 +characters that are special to the shell, including double-quotes. In the cases
1505 +where a \fBawk\fR program contains single-quote characters, it is usually
1506 +easiest to specify most of the program as strings within single-quotes
1507 +concatenated by the shell with quoted single-quote characters. For example:
1508 +.sp
1509 +.in +2
1510 +.nf
1511 +awk '/'\e''/ { print "quote:", $0 }'
1512 +.fi
1513 +.in -2
1514 +
1515 +.sp
451 1516 .LP
452 -\fBExample 1 \fRPrinting Lines Longer Than 72 Characters
1517 +prints all lines from the standard input containing a single-quote character,
1518 +prefixed with \fBquote:\fR.
453 1519 .sp
454 1520 .LP
455 -The following example is an \fBawk\fR script that can be executed by an \fBawk
456 --f examplescript\fR style command. It prints lines longer than seventy two
457 -characters:
1521 +The following are examples of simple \fBawk\fR programs:
1522 +.LP
1523 +\fBExample 1 \fRWrite to the standard output all input lines for which field 3
1524 +is greater than 5:
1525 +.sp
1526 +.in +2
1527 +.nf
1528 +\fB$3 > 5\fR
1529 +.fi
1530 +.in -2
1531 +.sp
458 1532
1533 +.LP
1534 +\fBExample 2 \fRWrite every tenth line:
459 1535 .sp
460 1536 .in +2
461 1537 .nf
462 -\fBlength > 72\fR
1538 +\fB(NR % 10) == 0\fR
463 1539 .fi
464 1540 .in -2
465 1541 .sp
466 1542
467 1543 .LP
468 -\fBExample 2 \fRPrinting Fields in Opposite Order
1544 +\fBExample 3 \fRWrite any line with a substring matching the regular
1545 +expression:
469 1546 .sp
1547 +.in +2
1548 +.nf
1549 +\fB/(G|D)(2[0-9][[:alpha:]]*)/\fR
1550 +.fi
1551 +.in -2
1552 +.sp
1553 +
470 1554 .LP
471 -The following example is an \fBawk\fR script that can be executed by an \fBawk
472 --f examplescript\fR style command. It prints the first two fields in opposite
473 -order:
1555 +\fBExample 4 \fRPrint any line with a substring containing a G or D, followed
1556 +by a sequence of digits and characters:
1557 +.sp
1558 +.LP
1559 +This example uses character classes \fBdigit\fR and \fBalpha\fR to match
1560 +language-independent digit and alphabetic characters, respectively.
474 1561
475 1562 .sp
476 1563 .in +2
477 1564 .nf
478 -\fB{ print $2, $1 }\fR
1565 +\fB/(G|D)([[:digit:][:alpha:]]*)/\fR
479 1566 .fi
480 1567 .in -2
481 1568 .sp
482 1569
483 1570 .LP
484 -\fBExample 3 \fRPrinting Fields in Opposite Order with the Input Fields
485 -Separated
1571 +\fBExample 5 \fRWrite any line in which the second field matches the regular
1572 +expression and the fourth field does not:
486 1573 .sp
1574 +.in +2
1575 +.nf
1576 +\fB$2 ~ /xyz/ && $4 !~ /xyz/\fR
1577 +.fi
1578 +.in -2
1579 +.sp
1580 +
487 1581 .LP
488 -The following example is an \fBawk\fR script that can be executed by an \fBawk
489 --f examplescript\fR style command. It prints the first two input fields in
490 -opposite order, separated by a comma, blanks or tabs:
1582 +\fBExample 6 \fRWrite any line in which the second field contains a backslash:
1583 +.sp
1584 +.in +2
1585 +.nf
1586 +\fB$2 ~ /\e\e/\fR
1587 +.fi
1588 +.in -2
1589 +.sp
491 1590
1591 +.LP
1592 +\fBExample 7 \fRWrite any line in which the second field contains a backslash
1593 +(alternate method):
492 1594 .sp
1595 +.LP
1596 +Notice that backslash escapes are interpreted twice, once in lexical processing
1597 +of the string and once in processing the regular expression.
1598 +
1599 +.sp
493 1600 .in +2
494 1601 .nf
495 -\fBBEGIN { FS = ",[ \et]*|[ \et]+" }
496 - { print $2, $1 }\fR
1602 +\fB$2 ~ "\e\e\e\e"\fR
497 1603 .fi
498 1604 .in -2
499 1605 .sp
500 1606
501 1607 .LP
502 -\fBExample 4 \fRAdding Up the First Column, Printing the Sum and Average
1608 +\fBExample 8 \fRWrite the second to the last and the last field in each line,
1609 +separating the fields by a colon:
503 1610 .sp
1611 +.in +2
1612 +.nf
1613 +\fB{OFS=":";print $(NF-1), $NF}\fR
1614 +.fi
1615 +.in -2
1616 +.sp
1617 +
504 1618 .LP
505 -The following example is an \fBawk\fR script that can be executed by an \fBawk
506 --f examplescript\fR style command. It adds up the first column, and prints the
507 -sum and average:
1619 +\fBExample 9 \fRWrite the line number and number of fields in each line:
1620 +.sp
1621 +.LP
1622 +The three strings representing the line number, the colon and the number of
1623 +fields are concatenated and that string is written to standard output.
508 1624
509 1625 .sp
510 1626 .in +2
511 1627 .nf
512 -\fB{ s += $1 }
513 -END { print "sum is", s, " average is", s/NR }\fR
1628 +\fB{print NR ":" NF}\fR
514 1629 .fi
515 1630 .in -2
516 1631 .sp
517 1632
518 1633 .LP
519 -\fBExample 5 \fRPrinting Fields in Reverse Order
1634 +\fBExample 10 \fRWrite lines longer than 72 characters:
520 1635 .sp
1636 +.in +2
1637 +.nf
1638 +\fB{length($0) > 72}\fR
1639 +.fi
1640 +.in -2
1641 +.sp
1642 +
521 1643 .LP
522 -The following example is an \fBawk\fR script that can be executed by an \fBawk
523 --f examplescript\fR style command. It prints fields in reverse order:
1644 +\fBExample 11 \fRWrite first two fields in opposite order separated by the OFS:
1645 +.sp
1646 +.in +2
1647 +.nf
1648 +\fB{ print $2, $1 }\fR
1649 +.fi
1650 +.in -2
1651 +.sp
524 1652
1653 +.LP
1654 +\fBExample 12 \fRSame, with input fields separated by comma or space and tab
1655 +characters, or both:
525 1656 .sp
526 1657 .in +2
527 1658 .nf
528 -\fB{ for (i = NF; i > 0; \(mi\(mii) print $i }\fR
1659 +\fBBEGIN { FS = ",[\et]*|[\et]+" }
1660 + { print $2, $1 }\fR
529 1661 .fi
530 1662 .in -2
531 1663 .sp
532 1664
533 1665 .LP
534 -\fBExample 6 \fRPrinting All lines Between \fBstart/stop\fR Pairs
1666 +\fBExample 13 \fRAdd up first column, print sum and average:
535 1667 .sp
1668 +.in +2
1669 +.nf
1670 +\fB{s += $1 }
1671 +END {print "sum is ", s, " average is", s/NR}\fR
1672 +.fi
1673 +.in -2
1674 +.sp
1675 +
536 1676 .LP
537 -The following example is an \fBawk\fR script that can be executed by an \fBawk
538 --f examplescript\fR style command. It prints all lines between start/stop
539 -pairs.
1677 +\fBExample 14 \fRWrite fields in reverse order, one per line (many lines out
1678 +for each line in):
1679 +.sp
1680 +.in +2
1681 +.nf
1682 +\fB{ for (i = NF; i > 0; --i) print $i }\fR
1683 +.fi
1684 +.in -2
1685 +.sp
540 1686
1687 +.LP
1688 +\fBExample 15 \fRWrite all lines between occurrences of the strings "start" and
1689 +"stop":
541 1690 .sp
542 1691 .in +2
543 1692 .nf
544 1693 \fB/start/, /stop/\fR
545 1694 .fi
546 1695 .in -2
547 1696 .sp
548 1697
549 1698 .LP
550 -\fBExample 7 \fRPrinting All Lines Whose First Field is Different from the
551 -Previous One
1699 +\fBExample 16 \fRWrite all lines whose first field is different from the
1700 +previous one:
552 1701 .sp
1702 +.in +2
1703 +.nf
1704 +\fB$1 != prev { print; prev = $1 }\fR
1705 +.fi
1706 +.in -2
1707 +.sp
1708 +
553 1709 .LP
554 -The following example is an \fBawk\fR script that can be executed by an \fBawk
555 --f examplescript\fR style command. It prints all lines whose first field is
556 -different from the previous one.
1710 +\fBExample 17 \fRSimulate the echo command:
1711 +.sp
1712 +.in +2
1713 +.nf
1714 +\fBBEGIN {
1715 + for (i = 1; i < ARGC; ++i)
1716 + printf "%s%s", ARGV[i], i==ARGC-1?"\en":""
1717 + }\fR
1718 +.fi
1719 +.in -2
1720 +.sp
557 1721
1722 +.LP
1723 +\fBExample 18 \fRWrite the path prefixes contained in the PATH environment
1724 +variable, one per line:
558 1725 .sp
559 1726 .in +2
560 1727 .nf
561 -\fB$1 != prev { print; prev = $1 }\fR
1728 +\fBBEGIN {
1729 + n = split (ENVIRON["PATH"], path, ":")
1730 + for (i = 1; i <= n; ++i)
1731 + print path[i]
1732 + }\fR
562 1733 .fi
563 1734 .in -2
564 1735 .sp
565 1736
566 1737 .LP
567 -\fBExample 8 \fRPrinting a File and Filling in Page numbers
1738 +\fBExample 19 \fRPrint the file "input", filling in page numbers starting at 5:
568 1739 .sp
569 1740 .LP
570 -The following example is an \fBawk\fR script that can be executed by an \fBawk
571 --f examplescript\fR style command. It prints a file and fills in page numbers
572 -starting at 5:
1741 +If there is a file named \fBinput\fR containing page headers of the form
573 1742
574 1743 .sp
575 1744 .in +2
576 1745 .nf
577 -\fB/Page/ { $2 = n++; }
578 - { print }\fR
1746 +Page#
579 1747 .fi
580 1748 .in -2
581 -.sp
582 1749
1750 +.sp
583 1751 .LP
584 -\fBExample 9 \fRPrinting a File and Numbering Its Pages
1752 +and a file named \fBprogram\fR that contains
1753 +
585 1754 .sp
1755 +.in +2
1756 +.nf
1757 +/Page/{ $2 = n++; }
1758 +{ print }
1759 +.fi
1760 +.in -2
1761 +
1762 +.sp
586 1763 .LP
587 -Assuming this program is in a file named \fBprog\fR, the following example
588 -prints the file \fBinput\fR numbering its pages starting at \fB5\fR:
1764 +then the command line
589 1765
590 1766 .sp
591 1767 .in +2
592 1768 .nf
593 -example% \fBawk -f prog n=5 input\fR
1769 +\fBawk -f program n=5 input\fR
594 1770 .fi
595 1771 .in -2
596 1772 .sp
597 1773
598 -.SH ENVIRONMENT VARIABLES
599 1774 .sp
600 1775 .LP
1776 +prints the file \fBinput\fR, filling in page numbers starting at 5.
1777 +
1778 +.SH ENVIRONMENT VARIABLES
601 1779 See \fBenviron\fR(5) for descriptions of the following environment variables
602 -that affect the execution of \fBawk\fR: \fBLANG\fR, \fBLC_ALL\fR,
603 -\fBLC_COLLATE\fR, \fBLC_CTYPE\fR, \fBLC_MESSAGES\fR, \fBNLSPATH\fR, and
604 -\fBPATH\fR.
1780 +that affect execution: \fBLC_COLLATE\fR, \fBLC_CTYPE\fR, \fBLC_MESSAGES\fR, and
1781 +\fBNLSPATH\fR.
605 1782 .sp
606 1783 .ne 2
607 1784 .na
608 1785 \fB\fBLC_NUMERIC\fR\fR
609 1786 .ad
610 1787 .RS 14n
611 1788 Determine the radix character used when interpreting numeric input, performing
612 1789 conversions between numeric and string values and formatting numeric output.
613 1790 Regardless of locale, the period character (the decimal-point character of the
614 1791 POSIX locale) is the decimal-point character recognized in processing \fBawk\fR
615 1792 programs (including assignments in command-line arguments).
616 1793 .RE
617 1794
618 -.SH ATTRIBUTES
1795 +.SH EXIT STATUS
1796 +The following exit values are returned:
619 1797 .sp
620 -.LP
621 -See \fBattributes\fR(5) for descriptions of the following attributes:
622 -.SS "/usr/bin/awk"
623 -.sp
1798 +.ne 2
1799 +.na
1800 +\fB\fB0\fR\fR
1801 +.ad
1802 +.RS 6n
1803 +All input files were processed successfully.
1804 +.RE
624 1805
625 1806 .sp
626 -.TS
627 -box;
628 -c | c
629 -l | l .
630 -ATTRIBUTE TYPE ATTRIBUTE VALUE
631 -_
632 -CSI Not Enabled
633 -.TE
1807 +.ne 2
1808 +.na
1809 +\fB\fB>0\fR\fR
1810 +.ad
1811 +.RS 6n
1812 +An error occurred.
1813 +.RE
634 1814
635 -.SS "/usr/xpg4/bin/awk"
636 1815 .sp
1816 +.LP
1817 +The exit status can be altered within the program by using an \fBexit\fR
1818 +expression.
637 1819
638 -.sp
639 -.TS
640 -box;
641 -c | c
642 -l | l .
643 -ATTRIBUTE TYPE ATTRIBUTE VALUE
644 -_
645 -CSI Enabled
646 -_
647 -Interface Stability Standard
648 -.TE
649 -
650 1820 .SH SEE ALSO
1821 +\fBed\fR(1), \fBegrep\fR(1), \fBgrep\fR(1), \fBlex\fR(1), \fBoawk\fR(1),
1822 +\fBsed\fR(1), \fBpopen\fR(3C), \fBprintf\fR(3C), \fBsystem\fR(3C),
1823 +\fBattributes\fR(5), \fBenviron\fR(5), \fBlargefile\fR(5), \fBregex\fR(5),
1824 +\fBXPG4\fR(5)
651 1825 .sp
652 1826 .LP
653 -\fBegrep\fR(1), \fBgrep\fR(1), \fBnawk\fR(1), \fBsed\fR(1), \fBprintf\fR(3C),
654 -\fBattributes\fR(5), \fBenviron\fR(5), \fBlargefile\fR(5), \fBstandards\fR(5)
655 -.SH NOTES
1827 +Aho, A. V., B. W. Kernighan, and P. J. Weinberger, \fIThe AWK Programming
1828 +Language\fR, Addison-Wesley, 1988.
1829 +
1830 +.SH DIAGNOSTICS
1831 +If any \fIfile\fR operand is specified and the named file cannot be accessed,
1832 +\fBawk\fR writes a diagnostic message to standard error and terminate without
1833 +any further action.
656 1834 .sp
657 1835 .LP
1836 +If the program specified by either the \fIprogram\fR operand or a
1837 +\fIprogfile\fR operand is not a valid \fBawk\fR program (as specified in
1838 +\fBEXTENDED DESCRIPTION\fR), the behavior is undefined.
1839 +
1840 +.SH NOTES
658 1841 Input white space is not preserved on output if fields are involved.
659 1842 .sp
660 1843 .LP
661 1844 There are no explicit conversions between numbers and strings. To force an
662 -expression to be treated as a number, add \fB0\fR to it. To force an expression
663 -to be treated as a string, concatenate the null string (\fB""\fR) to it.
1845 +expression to be treated as a number add 0 to it; to force it to be treated as
1846 +a string concatenate the null string (\fB""\fR) to it.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX