1 .\"
   2 .\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
   3 .\" permission to reproduce portions of its copyrighted documentation.
   4 .\" Original documentation from The Open Group can be obtained online at
   5 .\" http://www.opengroup.org/bookstore/.
   6 .\"
   7 .\" The Institute of Electrical and Electronics Engineers and The Open
   8 .\" Group, have given us permission to reprint portions of their
   9 .\" documentation.
  10 .\"
  11 .\" In the following statement, the phrase ``this text'' refers to portions
  12 .\" of the system documentation.
  13 .\"
  14 .\" Portions of this text are reprinted and reproduced in electronic form
  15 .\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
  16 .\" Standard for Information Technology -- Portable Operating System
  17 .\" Interface (POSIX), The Open Group Base Specifications Issue 6,
  18 .\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
  19 .\" Engineers, Inc and The Open Group.  In the event of any discrepancy
  20 .\" between these versions and the original IEEE and The Open Group
  21 .\" Standard, the original IEEE and The Open Group Standard is the referee
  22 .\" document.  The original Standard can be obtained online at
  23 .\" http://www.opengroup.org/unix/online.html.
  24 .\"
  25 .\" This notice shall appear on any product containing this material.
  26 .\"
  27 .\" The contents of this file are subject to the terms of the
  28 .\" Common Development and Distribution License (the "License").
  29 .\" You may not use this file except in compliance with the License.
  30 .\"
  31 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  32 .\" or http://www.opensolaris.org/os/licensing.
  33 .\" See the License for the specific language governing permissions
  34 .\" and limitations under the License.
  35 .\"
  36 .\" When distributing Covered Code, include this CDDL HEADER in each
  37 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  38 .\" If applicable, add the following below this CDDL HEADER, with the
  39 .\" fields enclosed by brackets "[]" replaced with your own identifying
  40 .\" information: Portions Copyright [yyyy] [name of copyright owner]
  41 .\"
  42 .\"
  43 .\" Copyright 1989 AT&T
  44 .\" Portions Copyright (c) 1992, X/Open Company Limited.  All Rights Reserved.
  45 .\" Copyright (c) 2005, Sun Microsystems, Inc.  All Rights Reserved
  46 .\"
  47 .TH AWK 1 "Jun 22, 2005"
  48 .SH NAME
  49 awk \- pattern scanning and processing language
  50 .SH SYNOPSIS
  51 .LP
  52 .nf
  53 \fB/usr/bin/awk\fR [\fB-f\fR \fIprogfile\fR] [\fB-F\fIc\fR\fR] [' \fIprog\fR '] [\fIparameters\fR]
  54      [\fIfilename\fR]...
  55 .fi
  56 
  57 .LP
  58 .nf
  59 \fB/usr/xpg4/bin/awk\fR [\fB-F\fR\fIcERE\fR] [\fB-v\fR \fIassignment\fR]... \fI\&'program'\fR \fB-f\fR \fIprogfile\fR...
  60      [\fIargument\fR]...
  61 .fi
  62 
  63 .SH DESCRIPTION
  64 .sp
  65 .LP
  66 The \fB/usr/xpg4/bin/awk\fR utility is described on the \fBnawk\fR(1) manual
  67 page.
  68 .sp
  69 .LP
  70 The \fB/usr/bin/awk\fR utility scans each input \fIfilename\fR for lines that
  71 match any of a set of patterns specified in \fIprog\fR. The \fIprog\fR string
  72 must be enclosed in single quotes (\fB a\'\fR) to protect it from the shell.
  73 For each pattern in \fIprog\fR there can be an associated action performed when
  74 a line of a \fIfilename\fR matches the pattern. The set of pattern-action
  75 statements can appear literally as \fIprog\fR or in a file specified with the
  76 \fB-f\fR\fI progfile\fR option. Input files are read in order; if there are no
  77 files, the standard input is read. The file name \fB\&'\(mi'\fR means the
  78 standard input.
  79 .SH OPTIONS
  80 .sp
  81 .LP
  82 The following options are supported:
  83 .sp
  84 .ne 2
  85 .na
  86 \fB\fB-f\fR\fI progfile\fR \fR
  87 .ad
  88 .RS 16n
  89 \fBawk\fR uses the set of patterns it reads from \fIprogfile\fR.
  90 .RE
  91 
  92 .sp
  93 .ne 2
  94 .na
  95 \fB\fB-F\fR\fIc\fR \fR
  96 .ad
  97 .RS 16n
  98 Uses the character \fIc\fR as the field separator (FS) character.  See the
  99 discussion of \fBFS\fR below.
 100 .RE
 101 
 102 .SH USAGE
 103 .SS "Input Lines"
 104 .sp
 105 .LP
 106 Each input line is matched against the pattern portion of every pattern-action
 107 statement; the associated action is performed for each matched pattern. Any
 108 \fIfilename\fR of the form \fIvar=value\fR is treated as an assignment, not a
 109 filename, and is executed at the time it would have been opened if it were a
 110 filename. \fIVariables\fR assigned in this manner are not available inside a
 111 \fBBEGIN\fR rule, and are assigned after previously specified files have been
 112 read.
 113 .sp
 114 .LP
 115 An input line is normally made up of fields separated by white spaces. (This
 116 default can be changed by using the \fBFS\fR built-in variable or the
 117 \fB-F\fR\fIc\fR option.) The default is to ignore leading blanks and to
 118 separate fields by blanks and/or tab characters. However, if \fBFS\fR is
 119 assigned a value that does not include any of the white spaces, then leading
 120 blanks are not ignored. The fields are denoted \fB$1\fR, \fB$2\fR,
 121 \fB\&.\|.\|.\fR\|; \fB$0\fR refers to the entire line.
 122 .SS "Pattern-action Statements"
 123 .sp
 124 .LP
 125 A pattern-action statement has the form:
 126 .sp
 127 .in +2
 128 .nf
 129 \fIpattern\fR\fB { \fR\fIaction\fR\fB } \fR
 130 .fi
 131 .in -2
 132 .sp
 133 
 134 .sp
 135 .LP
 136 Either pattern or action can be omitted. If there is no action, the matching
 137 line is printed. If there is no pattern, the action is performed on every input
 138 line. Pattern-action statements are separated by newlines or semicolons.
 139 .sp
 140 .LP
 141 Patterns are arbitrary Boolean combinations ( \fB!\fR, ||, \fB&&\fR, and
 142 parentheses) of relational expressions and regular expressions. A relational
 143 expression is one of the following:
 144 .sp
 145 .in +2
 146 .nf
 147 \fIexpression relop expression
 148 expression matchop regular_expression\fR
 149 .fi
 150 .in -2
 151 
 152 .sp
 153 .LP
 154 where a \fIrelop\fR is any of the six relational operators in C, and a
 155 \fImatchop\fR is either \fB~\fR (contains) or \fB!~\fR (does not contain). An
 156 \fIexpression\fR is an arithmetic expression, a relational expression, the
 157 special expression
 158 .sp
 159 .in +2
 160 .nf
 161 \fIvar \fRin \fIarray\fR
 162 .fi
 163 .in -2
 164 
 165 .sp
 166 .LP
 167 or a Boolean combination of these.
 168 .sp
 169 .LP
 170 Regular expressions are as in \fBegrep\fR(1). In patterns they must be
 171 surrounded by slashes. Isolated regular expressions in a pattern apply to the
 172 entire line. Regular expressions can also occur in relational expressions. A
 173 pattern can consist of two patterns separated by a comma; in this case, the
 174 action is performed for all lines between the occurrence of the first pattern
 175 to the occurrence of the second pattern.
 176 .sp
 177 .LP
 178 The special patterns \fBBEGIN\fR and \fBEND\fR can be used to capture control
 179 before the first input line has been read and after the last input line has
 180 been read respectively. These keywords do not combine with any other patterns.
 181 .SS "Built-in Variables"
 182 .sp
 183 .LP
 184 Built-in variables include:
 185 .sp
 186 .ne 2
 187 .na
 188 \fB\fBFILENAME\fR \fR
 189 .ad
 190 .RS 13n
 191 name of the current input file
 192 .RE
 193 
 194 .sp
 195 .ne 2
 196 .na
 197 \fB\fBFS\fR \fR
 198 .ad
 199 .RS 13n
 200 input field separator regular expression (default blank and tab)
 201 .RE
 202 
 203 .sp
 204 .ne 2
 205 .na
 206 \fB\fBNF\fR \fR
 207 .ad
 208 .RS 13n
 209 number of fields in the current record
 210 .RE
 211 
 212 .sp
 213 .ne 2
 214 .na
 215 \fB\fBNR\fR \fR
 216 .ad
 217 .RS 13n
 218 ordinal number of the current record
 219 .RE
 220 
 221 .sp
 222 .ne 2
 223 .na
 224 \fB\fBOFMT\fR \fR
 225 .ad
 226 .RS 13n
 227 output format for numbers (default \fB%.6g\fR)
 228 .RE
 229 
 230 .sp
 231 .ne 2
 232 .na
 233 \fB\fBOFS\fR \fR
 234 .ad
 235 .RS 13n
 236 output field separator (default blank)
 237 .RE
 238 
 239 .sp
 240 .ne 2
 241 .na
 242 \fB\fBORS\fR \fR
 243 .ad
 244 .RS 13n
 245 output record separator (default new-line)
 246 .RE
 247 
 248 .sp
 249 .ne 2
 250 .na
 251 \fB\fBRS\fR \fR
 252 .ad
 253 .RS 13n
 254 input record separator (default new-line)
 255 .RE
 256 
 257 .sp
 258 .LP
 259 An action is a sequence of statements. A statement can be one of the following:
 260 .sp
 261 .in +2
 262 .nf
 263 if ( \fIexpression\fR ) \fIstatement\fR [ else \fIstatement\fR ]
 264 while ( \fIexpression\fR ) \fIstatement\fR
 265 do \fIstatement\fR while ( \fIexpression\fR )
 266 for ( \fIexpression\fR ; \fIexpression\fR ; \fIexpression\fR ) \fIstatement\fR
 267 for ( \fIvar\fR in \fIarray\fR ) \fIstatement\fR
 268 break
 269 continue
 270 { [ \fIstatement\fR ] .\|.\|. }
 271 \fIexpression\fR      # commonly variable = expression
 272 print [ \fIexpression-list\fR ] [ >\fIexpression\fR ]
 273 printf format [ ,\fIexpression-list\fR ] [ >\fIexpression\fR ]
 274 next            # skip remaining patterns on this input line
 275 exit [expr]     # skip the rest of the input; exit status is expr
 276 .fi
 277 .in -2
 278 
 279 .sp
 280 .LP
 281 Statements are terminated by semicolons, newlines, or right braces. An empty
 282 expression-list stands for the whole input line. Expressions take on string or
 283 numeric values as appropriate, and are built using the operators \fB+\fR,
 284 \fB\(mi\fR, \fB*\fR, \fB/\fR, \fB%\fR, \fB^\fR and concatenation (indicated by
 285 a blank). The operators \fB++\fR, \fB\(mi\(mi\fR, \fB+=\fR, \fB\(mi=\fR,
 286 \fB*=\fR, \fB/=\fR, \fB%=\fR, \fB^=\fR, \fB>\fR, \fB>=\fR, \fB<\fR, \fB<=\fR,
 287 \fB==\fR, \fB!=\fR, and \fB?:\fR are also available in expressions. Variables
 288 can be scalars, array elements (denoted x[i]), or fields. Variables are
 289 initialized to the null string or zero. Array subscripts can be any string, not
 290 necessarily numeric; this allows for a form of associative memory. String
 291 constants are quoted (\fB""\fR), with the usual C escapes recognized within.
 292 .sp
 293 .LP
 294 The \fBprint\fR statement prints its arguments on the standard output, or on a
 295 file if \fB>\fR\fIexpression\fR is present, or on a pipe if '\fB|\fR\fIcmd\fR'
 296 is present. The output resulted from the print statement is terminated by the
 297 output record separator with each argument separated by the current output
 298 field separator. The \fBprintf\fR statement formats its expression list
 299 according to the format (see \fBprintf\fR(3C)).
 300 .SS "Built-in Functions"
 301 .sp
 302 .LP
 303 The arithmetic functions are as follows:
 304 .sp
 305 .ne 2
 306 .na
 307 \fB\fBcos\fR(\fIx\fR)\fR
 308 .ad
 309 .RS 11n
 310 Return cosine of \fIx\fR, where \fIx\fR is in radians. (In
 311 \fB/usr/xpg4/bin/awk\fR only. See \fBnawk\fR(1).)
 312 .RE
 313 
 314 .sp
 315 .ne 2
 316 .na
 317 \fB\fBsin\fR(\fIx\fR)\fR
 318 .ad
 319 .RS 11n
 320 Return sine of \fIx\fR, where \fIx\fR is in radians. (In
 321 \fB/usr/xpg4/bin/awk\fR only. See \fBnawk\fR(1).)
 322 .RE
 323 
 324 .sp
 325 .ne 2
 326 .na
 327 \fB\fBexp\fR(\fIx\fR)\fR
 328 .ad
 329 .RS 11n
 330 Return the exponential function of \fIx\fR.
 331 .RE
 332 
 333 .sp
 334 .ne 2
 335 .na
 336 \fB\fBlog\fR(\fIx\fR)\fR
 337 .ad
 338 .RS 11n
 339 Return the natural logarithm of \fIx\fR.
 340 .RE
 341 
 342 .sp
 343 .ne 2
 344 .na
 345 \fB\fBsqrt\fR(\fIx\fR)\fR
 346 .ad
 347 .RS 11n
 348 Return the square root of \fIx\fR.
 349 .RE
 350 
 351 .sp
 352 .ne 2
 353 .na
 354 \fB\fBint\fR(\fIx\fR)\fR
 355 .ad
 356 .RS 11n
 357 Truncate its argument to an integer. It is truncated toward \fB0\fR when
 358 \fIx\fR >\fB 0\fR.
 359 .RE
 360 
 361 .sp
 362 .LP
 363 The string functions are as follows:
 364 .sp
 365 .ne 2
 366 .na
 367 \fB\fBindex(\fR\fIs\fR\fB, \fR\fIt\fR\fB)\fR\fR
 368 .ad
 369 .sp .6
 370 .RS 4n
 371 Return the position in string \fIs\fR where string \fIt\fR first occurs, or
 372 \fB0\fR if it does not occur at all.
 373 .RE
 374 
 375 .sp
 376 .ne 2
 377 .na
 378 \fB\fBint(\fR\fIs\fR\fB)\fR\fR
 379 .ad
 380 .sp .6
 381 .RS 4n
 382 truncates \fIs\fR to an integer value. If \fIs\fR is not specified, $0 is used.
 383 .RE
 384 
 385 .sp
 386 .ne 2
 387 .na
 388 \fB\fBlength(\fR\fIs\fR\fB)\fR\fR
 389 .ad
 390 .sp .6
 391 .RS 4n
 392 Return the length of its argument taken as a string, or of the whole line if
 393 there is no argument.
 394 .RE
 395 
 396 .sp
 397 .ne 2
 398 .na
 399 \fB\fBsplit(\fR\fIs\fR, \fIa\fR, \fIfs\fR\fB)\fR\fR
 400 .ad
 401 .sp .6
 402 .RS 4n
 403 Split the string \fIs\fR into array elements \fIa\fR[\fI1\fR],
 404 \fIa\fR[\fI2\fR], \|.\|.\|. \fIa\fR[\fIn\fR], and returns \fIn\fR. The
 405 separation is done with the regular expression \fIfs\fR or with the field
 406 separator \fBFS\fR if \fIfs\fR is not given.
 407 .RE
 408 
 409 .sp
 410 .ne 2
 411 .na
 412 \fB\fBsprintf(\fR\fIfmt\fR, \fIexpr\fR, \fIexpr\fR,\|.\|.\|.\|\fB)\fR\fR
 413 .ad
 414 .sp .6
 415 .RS 4n
 416 Format the expressions according to the \fBprintf\fR(3C) format given by
 417 \fIfmt\fR and returns the resulting string.
 418 .RE
 419 
 420 .sp
 421 .ne 2
 422 .na
 423 \fB\fBsubstr(\fR\fIs\fR, \fIm\fR, \fIn\fR\fB)\fR\fR
 424 .ad
 425 .sp .6
 426 .RS 4n
 427 returns the \fIn\fR-character substring of \fIs\fR that begins at position
 428 \fIm\fR.
 429 .RE
 430 
 431 .sp
 432 .LP
 433 The input/output function is as follows:
 434 .sp
 435 .ne 2
 436 .na
 437 \fB\fBgetline\fR\fR
 438 .ad
 439 .RS 11n
 440 Set \fB$0\fR to the next input record from the current input file.
 441 \fBgetline\fR returns \fB1\fR for successful input, \fB0\fR for end of file,
 442 and \fB\(mi1\fR for an error.
 443 .RE
 444 
 445 .SS "Large File Behavior"
 446 .sp
 447 .LP
 448 See \fBlargefile\fR(5) for the description of the behavior of \fBawk\fR when
 449 encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
 450 .SH EXAMPLES
 451 .LP
 452 \fBExample 1 \fRPrinting Lines Longer Than 72 Characters
 453 .sp
 454 .LP
 455 The following example is an \fBawk\fR script that can be executed by an \fBawk
 456 -f examplescript\fR style command. It prints lines longer than seventy two
 457 characters:
 458 
 459 .sp
 460 .in +2
 461 .nf
 462 \fBlength > 72\fR
 463 .fi
 464 .in -2
 465 .sp
 466 
 467 .LP
 468 \fBExample 2 \fRPrinting Fields in Opposite Order
 469 .sp
 470 .LP
 471 The following example is an \fBawk\fR script that can be executed by an \fBawk
 472 -f examplescript\fR style command. It prints the first two fields in opposite
 473 order:
 474 
 475 .sp
 476 .in +2
 477 .nf
 478 \fB{ print $2, $1 }\fR
 479 .fi
 480 .in -2
 481 .sp
 482 
 483 .LP
 484 \fBExample 3 \fRPrinting Fields in Opposite Order with the Input Fields
 485 Separated
 486 .sp
 487 .LP
 488 The following example is an \fBawk\fR script that can be executed by an \fBawk
 489 -f examplescript\fR style command. It prints the first two input fields in
 490 opposite order, separated by a comma, blanks or tabs:
 491 
 492 .sp
 493 .in +2
 494 .nf
 495 \fBBEGIN { FS = ",[ \et]*|[ \et]+" }
 496       { print $2, $1 }\fR
 497 .fi
 498 .in -2
 499 .sp
 500 
 501 .LP
 502 \fBExample 4 \fRAdding Up the First Column, Printing the Sum and Average
 503 .sp
 504 .LP
 505 The following example is an \fBawk\fR script that can be executed by an \fBawk
 506 -f examplescript\fR style command.  It adds up the first column, and prints the
 507 sum and average:
 508 
 509 .sp
 510 .in +2
 511 .nf
 512 \fB{ s += $1 }
 513 END  { print "sum is", s, " average is", s/NR }\fR
 514 .fi
 515 .in -2
 516 .sp
 517 
 518 .LP
 519 \fBExample 5 \fRPrinting Fields in Reverse Order
 520 .sp
 521 .LP
 522 The following example is an \fBawk\fR script that can be executed by an \fBawk
 523 -f examplescript\fR style command. It prints fields in reverse order:
 524 
 525 .sp
 526 .in +2
 527 .nf
 528 \fB{ for (i = NF; i > 0; \(mi\(mii) print $i }\fR
 529 .fi
 530 .in -2
 531 .sp
 532 
 533 .LP
 534 \fBExample 6 \fRPrinting All lines Between \fBstart/stop\fR Pairs
 535 .sp
 536 .LP
 537 The following example is an \fBawk\fR script that can be executed by an \fBawk
 538 -f examplescript\fR style command. It prints all lines between start/stop
 539 pairs.
 540 
 541 .sp
 542 .in +2
 543 .nf
 544 \fB/start/, /stop/\fR
 545 .fi
 546 .in -2
 547 .sp
 548 
 549 .LP
 550 \fBExample 7 \fRPrinting All Lines Whose First Field is Different from the
 551 Previous One
 552 .sp
 553 .LP
 554 The following example is an \fBawk\fR script that can be executed by an \fBawk
 555 -f examplescript\fR style command. It prints all lines whose first field is
 556 different from the previous one.
 557 
 558 .sp
 559 .in +2
 560 .nf
 561 \fB$1 != prev { print; prev = $1 }\fR
 562 .fi
 563 .in -2
 564 .sp
 565 
 566 .LP
 567 \fBExample 8 \fRPrinting a File and Filling in Page numbers
 568 .sp
 569 .LP
 570 The following example is an \fBawk\fR script that can be executed by an \fBawk
 571 -f examplescript\fR style command. It prints a file and fills in page numbers
 572 starting at 5:
 573 
 574 .sp
 575 .in +2
 576 .nf
 577 \fB/Page/       { $2 = n++; }
 578            { print }\fR
 579 .fi
 580 .in -2
 581 .sp
 582 
 583 .LP
 584 \fBExample 9 \fRPrinting a File and Numbering Its Pages
 585 .sp
 586 .LP
 587 Assuming this program is in a file named \fBprog\fR, the following example
 588 prints the file \fBinput\fR numbering its pages starting at \fB5\fR:
 589 
 590 .sp
 591 .in +2
 592 .nf
 593 example% \fBawk -f prog n=5 input\fR
 594 .fi
 595 .in -2
 596 .sp
 597 
 598 .SH ENVIRONMENT VARIABLES
 599 .sp
 600 .LP
 601 See \fBenviron\fR(5) for descriptions of the following environment variables
 602 that affect the execution of \fBawk\fR: \fBLANG\fR, \fBLC_ALL\fR,
 603 \fBLC_COLLATE\fR, \fBLC_CTYPE\fR, \fBLC_MESSAGES\fR, \fBNLSPATH\fR, and
 604 \fBPATH\fR.
 605 .sp
 606 .ne 2
 607 .na
 608 \fB\fBLC_NUMERIC\fR\fR
 609 .ad
 610 .RS 14n
 611 Determine the radix character used when interpreting numeric input, performing
 612 conversions between numeric and string values and formatting numeric output.
 613 Regardless of locale, the period character (the decimal-point character of the
 614 POSIX locale) is the decimal-point character recognized in processing \fBawk\fR
 615 programs (including assignments in command-line arguments).
 616 .RE
 617 
 618 .SH ATTRIBUTES
 619 .sp
 620 .LP
 621 See \fBattributes\fR(5) for descriptions of the following attributes:
 622 .SS "/usr/bin/awk"
 623 .sp
 624 
 625 .sp
 626 .TS
 627 box;
 628 c | c
 629 l | l .
 630 ATTRIBUTE TYPE  ATTRIBUTE VALUE
 631 _
 632 CSI     Not Enabled
 633 .TE
 634 
 635 .SS "/usr/xpg4/bin/awk"
 636 .sp
 637 
 638 .sp
 639 .TS
 640 box;
 641 c | c
 642 l | l .
 643 ATTRIBUTE TYPE  ATTRIBUTE VALUE
 644 _
 645 CSI     Enabled
 646 _
 647 Interface Stability     Standard
 648 .TE
 649 
 650 .SH SEE ALSO
 651 .sp
 652 .LP
 653 \fBegrep\fR(1), \fBgrep\fR(1), \fBnawk\fR(1), \fBsed\fR(1), \fBprintf\fR(3C),
 654 \fBattributes\fR(5), \fBenviron\fR(5), \fBlargefile\fR(5), \fBstandards\fR(5)
 655 .SH NOTES
 656 .sp
 657 .LP
 658 Input white space is not preserved on output if fields are involved.
 659 .sp
 660 .LP
 661 There are no explicit conversions between numbers and strings. To force an
 662 expression to be treated as a number, add \fB0\fR to it. To force an expression
 663 to be treated as a string, concatenate the null string (\fB""\fR) to it.