How to Use awk to Group by and Sum Column Values


Often you may want to use awk to sum the values in one column of a file, grouped by the values of another column in a file.

You can use the following syntax to do so:

awk '
 NR == 1 { print; next }
 { a[$1] += $2 }
 END {
   for (i in a) {
     printf "%-15s\t%s\n", i, a[i] | "sort -rnk2";
   }
 }
' player_data.txt

This particular example will calculate the sum of values in column 2, grouped by the values in column 1 of the file named player_data.txt.

Note that we also sort the results by the summed values from column 2, which is optional.

The following example shows how to use this syntax in practice.

Example: How to Use awk to Group by and Sum Column Values

Suppose we have a file named player_data.txt that contains information about points scored by basketball players on various teams.

We can use the cat command to view the contents of this file:

The first column contains the team name for each player and the second column contains the points scored by the player.

Suppose that we would like to calculate the sum of points scored, grouped by team.

We can use the following syntax to do so:

awk '{sum+=$2;} END{print sum;}' players.txt

The following screenshot shows how to use this syntax in practice:

awk group by sum

The output displays the sum of points scored by players on each team, sorted by the sum of points.

From the output we can see:

  • Players on the Kings scored a total of 62 points.
  • Players on the Mavs scored a total of 52 points.
  • Players on the Celtics scored a total of 32 points.

Note: We used the printf command to print the results with a tab ( \t ) between the team name and the sum of points but you can change this to a space or a different separator if you would like.

Related Tutorials

The following tutorials explain how to perform other common tasks in awk:

How to Use awk to Print All Columns After Specific Number
How to Use awk to Print Rows Where Column Equals Value
How to Use awk to Print Last Line of a File
How to Use awk to Print a Range of Columns

Leave a Reply