40 Practical and Useful awk Command in Linux and BSD

AWK is a powerful data-driven programming language that dates its origin back to the early days of Unix. It was initially developed for writing 'one-liner' programs but has since evolved into a full-fledged programming language. AWK gets its name from the initials of its authors - Aho, Weinberger, and Kernighan. The awk command in Linux and other Unix systems invokes the interpreter that runs AWK scripts. Several implementations of awk exist in recent systems such as gawk (GNU awk), mawk (Minimal awk), and nawk (New awk), among others. Check out the below examples if you want to master awk.

Understanding AWK Programs

Programs written in awk consist of rules, which are simply a pair of patterns and actions. The patterns are grouped within a brace , and the action part is triggered whenever awk finds texts that match the pattern. Although awk was developed for writing one-liners, experienced users can easily write complex scripts with it.

AWK programs are very useful for large-scale file processing. It identifies text fields using special characters and separators. It also offers high-level programming constructs like arrays and loops. So writing robust programs using plain awk is very feasible.

Practical Examples of awk Command in Linux

Admins normally use awk for data extraction and reporting alongside other types of file manipulations. Below we have discussed awk in more detail. Follow the commands carefully and try them in your terminal for a complete understanding.

1. Print Specific Fields from Text Output

The most widely used Linux commands display their output using various fields. Normally, we use the Linux cut command for extracting a specific field from such data. However, the below command shows you how to do this using the awk command.

$ who | awk 'print $1'

This command will display only the first field from the output of the who command. So, you will simply get the usernames of all currently logged users. Here, $1 represents the first field. You need to use $N if you want to extract the N-th field.

2. Print Multiple Fields from Text Output

The awk interpreter allows us to print any number of fields we want. The below examples show us how to extract the first two fields from the output of the who command.

$ who | awk 'print $1, $2'

You can also control the order of the output fields. The following example first displays the second column produced by the who command and then the first column in the second field.

$ who | awk 'print $2, $1'

Simply leave out the field parameters ($N) to display the entire data.

3. Use BEGIN Statements

The BEGIN statement allows users to print some known information in the output. It is usually used for formatting the output data generated by awk. The syntax for this statement is shown below.

BEGIN  Actions ACTION

The actions which form the BEGIN section is always triggered. Then awk reads the remaining lines one by one and sees if anything needs to be done.

$ who | awk 'BEGIN print "User\tFrom" print $1, $2'

The above command will label the two output fields extracted from the who command's output.

4. Use END Statements

You can also use the END statement to make sure that certain actions are always performed at the end of your operation. Simply place the END section after the main set of actions.

$ who | awk 'BEGIN print "User\tFrom" print $1, $2 END print "--COMPLETED--"'

The above command will append the given string at the end of the output.

5. Search Using Patterns

A large portion of awk's workings involves pattern matching and regex. As we've already discussed, awk searches for patterns in each input line and only executes the action when a match is triggered. Our previous rules consisted of only actions. Below, we've illustrated the basics of pattern matching using the awk command in Linux.

$ who | awk '/mary/ print'

This command will see if the user mary is currently logged on or not. It will output the entire line if any match is found.

6. Extract Information from Files

The awk command works very well with files and can be used for complex file processing tasks. The following command illustrates how awk handles files.

$ awk '/hello/ print' /usr/share/dict/american-english

This command searches for the pattern 'hello' in the american-english dictionary file. It is available on most Linux-based distributions. Thus, you can easily try awk programs on this file.

7. Read AWK Script from Source File

Although writing one-liner programs is useful, you can also write large programs using awk entirely. You will want to save them and run your program using the source file.

$ awk -f script-file $ awk --file script-file

The -f or -file option allows us to specify the program file. However, you do not need to use quotes (") inside the script-file since the Linux shell will not interpret the program code this way.

8. Set Input Field Separator

A field separator is a delimiter that divides the input record. We can easily specify field separators to awk using the -F or -field-separator option. Check out the below commands to see how this works.

$ echo "This-is-a-simple-example" | awk -F - ' print $1 ' $ echo "This-is-a-simple-example" | awk --field-separator - ' print $1 '

It works the same when using script files rather than one-liner awk command in Linux.

9. Print Information Based On Condition

We've discussed the Linux cut command in a previous guide. Now we'll show you how to extract information using awk only when certain criteria are matched. We will be using the same test file we used in that guide. So head over there and make a copy of the test.txt file.

$ awk '$4 > 50' test.txt

This command will print out all nations from the test.txt file, which has more than 50 million population.

10. Print Information by Comparing Regular Expressions

The following awk command checks whether the third field of any line contains the pattern 'Lira' and prints out the entire line if a match is found. We are again using the test.txt file used to illustrate the Linux cut command. So make sure you've got this file before proceeding.

$ awk '$3 ~ /Lira/' test.txt

You may choose to only print a specific portion of any match if you want.

11. Count the Total Number of Lines in Input

The awk command has many special-purpose variables that allow us to do many advanced things easily. One such variable is NR, which contains the current line number.

$ awk 'END print NR ' test.txt

This command will output how many lines are there in our test.txt file. It first iterates over each line, and once it has reached END, it will print the value of NR - which contains the total number of lines in this case.

12. Set Output Field Separator

Earlier, we have shown how to select input field separators using the -F or -field-separator option. The awk command also allows us to specify the output field separator. The below example demonstrates this using a practical example.

$ date | awk 'OFS="-" print$2,$3,$6'

This command prints out the current date using the dd-mm-yy format. Run the date program without awk to see how the default output looks like.

13. Using the If Construct

Like other popular programming languages, awk also provides users with the if-else constructs. The if statement in awk has the below syntax.

if (expression)  first_action second_action

The corresponding actions are only performed if the conditional expression is true. The below example demonstrates this using our reference file test.txt.

$ awk ' if ($4>100) print ' test.txt

You do not need to maintain the indentation strictly.

14. Using If-Else Constructs

You can construct useful if-else ladders using the below syntax. They are useful when devising complex awk scripts that deal with dynamic data.

if (expression) first_action else second_action

$ awk ' if ($4>100) print; else print ' test.txt

The above command will print the entire reference file since the fourth field is not greater than 100 for each line.

15. Set the Field Width

Sometimes the input data is quite messy, and users might find it difficult to visualize them in their reports. Fortunately, awk provides a powerful built-in variable called FIELDWIDTHS that allows us to define a whitespace-separated list of widths.

$ echo 5675784464657 | awk 'BEGIN FIELDWIDTHS= "3 4 5" print $1, $2, $3'

It is very useful when parsing scattered data since we can control the output field width exactly as we want.

16. Set the Record Separator

The RS or Record Separator is another in-built variable that allows us to specify how records are separated. Let us first create a file that will demonstrate the workings of this awk variable.

$ cat new.txt Melinda James 23 New Hampshire (222) 466-1234 Daniel James 99 Phonenix Road (322) 677-3412

$ awk 'BEGINFS="\n"; print $1,$3' new.txt

This command will parse the document and spit out the name and address for the two persons.

17. Print Environment Variables

The awk command in Linux allows us to print environment variables easily using the variable ENVIRON. The below command demonstrates how to use this for printing out the contents of the PATH variable.

$ awk 'BEGIN print ENVIRON["PATH"] '

You can print the contents of any environment variables by substituting the argument of the ENVIRON variable. The below command prints the value of the environment variable HOME.

$ awk 'BEGIN print ENVIRON["HOME"] '

18. Omit Some Fields from Output

The awk command allows us to omit specific lines from our output. The following command will demonstrate this using our reference file test.txt.

$ awk -F":" '$2=""; print' test.txt

This command will omit the second column of our file, which contains the name of the capital for each country. You can also omit more than one field, as shown in the next command.

$ awk -F":" '$2="";$3="";print' test.txt

19. Remove Empty Lines

Sometimes data may contain too many blank lines. You can use the awk command to remove empty lines pretty easily. Check out the next command to see how this works in practice.

$ awk '/^[ \t]*$/nextprint' new.txt

We have removed all empty lines from the file new.txt using a simple regular expression and an awk built-in called next.

20. Remove Trailing Whitespaces

The output of many Linux commands contains trailing whitespaces. We can use the awk command in Linux to remove such whitespaces like spaces and tabs. Check out the below command to see how to tackle such problems using awk.

$ awk 'sub(/[ \t]*$/, "");print' new.txt test.txt

Add some trailing whitespaces to our reference files and verify whether awk emoved them successfully or not. It did this successfully in my machine.

21. Check the Number of Fields in Each Line

We can easily check how many fields are there in a line using a simple awk one-liner. There are many ways to do this, but we will use some of the awk's in-built variables for this task. The NR variable gives us the line number, and the NF variable provides the number of fields.

$ awk 'print NR,"-->",NF' test.txt

Now we can confirm how many fields are there per line in our test.txt document. Since each line of this file contains 5 fields, we are assured that the command is working as expected.

22. Verify Current Filename

The awk variable FILENAME is used for verifying the current input filename. We are demonstrating how this works using a simple example. However, it can be useful in situations where the filename is not known explicitly, or there is more than one input file.

$ awk 'print FILENAME' test.txt $ awk 'print FILENAME' test.txt new.txt

The above commands print out the filename awk is working on each time it processes a new line of the input files.

23. Verify Number of Processed Records

The following example will showcase how we can verify the number of records processed by the awk command. Since a large number of Linux system admins use awk for generating reports, it is very useful for them.

$ awk 'print "Processing Record - ",NR; END print "\nTotal Records Processed:", NR;' test.txt

I often use this awk snippet for having a clear overview of my actions. You can easily tweak it to accommodate new ideas or actions.

24. Print the Total Number of Characters in a Record

The awk language provides a handy function called length() that tells us how many characters are present in a record. It is very useful in a number of scenarios. Take a quick look at the following example to see how this works.

$ echo "A random text string… " | awk ' print length($0); '

$ awk ' print length($0); ' /etc/passwd

The above command will print the total number of characters present in each line of the input string or file.

25. Print all Lines Longer than a Specified Length

We can add in some conditionals to the above command and make it only print those lines that are greater than a predefined length. It is useful when you already have an idea about the length of a specific record.

$ echo "A random text string… " | awk 'length($0) > 10'

$ awk ' length($0) > 5; ' /etc/passwd

You can throw in more options and/or arguments to tweak the command based on your requirements.

26. Print the Number of Lines, Characters, and Words

The following awk command in Linux prints the number of lines, characters, and words in a given input. It utilizes the NR variable as well as some basic arithmetic for doing this operation.

$ echo "This is a input line… " | awk ' w += NF; c += length + 1  END  print NR, w, c '

It shows that there are 1 line, 5 words, and exactly 24 characters present in the input string.

27. Calculate the Frequency of Words

We can combine associative arrays and the for loop in awk to calculate the word frequency of a document. The following command may seem a little complex, but it is fairly simple once you understand the basic constructs clearly.

$ awk 'BEGIN FS="[^a-zA-Z]+"   for (i=1; i<=NF; i++) words[tolower($i)]++  END  for (i in words) print i, words[i] ' test.txt

If you're having trouble with the one-liner snippet, copy the following code into a new file and run it using the source.

$ cat > frequency.awk BEGIN  FS="[^a-zA-Z]+"   for (i=1; i<=NF; i++) words[tolower($i)]++  END  for (i in words) print i, words[i]

Then run it using the -f option.

$ awk -f frequency.awk test.txt

28. Rename Files using AWK

The awk command can be used for renaming all files matching certain criteria. The following command illustrates how to use awk for renaming all .MP3 files in a directory to .mp3 files.

$ touch a,b,c,d,e.MP3 $ ls *.MP3 | awk ' printf("mv \"%s\"" \""%s\""\n"", $0, tolower($0)) ' $ ls *.MP3 | awk ' printf(""mv \""%s\"" \""%s\""\n"", $0, tolower($0)) ' | sh

First, we created some demo files with .MP3 extension. The second command shows the user what happens when the rename is successful. Finally, the last command does the rename operation using the mv command in Linux.

29. Print the Square Root of a Number

AWK offers several in-built functions for manipulating numerals. One of them is the sqrt() function. It is a C-like function that returns the square root of a given number. Take a quick look at the next example to see how this works in general.

$ awk 'BEGIN print sqrt(36)