Bash: How to Calculate Frequency of Each Word in File


Often you may want to use Bash to calculate the frequency of each word in a given file.

You can use the following syntax to do so:

tr ' ' '\n' < athlete_data.txt | sort | uniq -c| sort -rn

This particular example will return the frequency of each unique word in the file named athlete_data.txt.

Here is how this syntax works:

  • First, we use tr ‘ ‘ ‘\n’ to replace each space with a newline. This places each word on its own line.
  • We then use sort to sort the lines.
  • We then use uniq -c to count the occurrence of each unique word.
  • We then use sort -rn to sort the results by frequency in descending order.

Note that you could use sort -n to instead sort the results by frequency in ascending order.

The following example shows how to use this syntax in practice.

Example: Use Bash to Calculate Frequency of Each Word in File

Suppose that we have a file named athlete_data.txt that contains the team name, position and conference for various basketball players.

We can use the cat command to view the contents of this file:

Now suppose that we would like to calculate the frequency of each word in this file.

We can use the following syntax to do so:

tr ' ' '\n' < athlete_data.txt | sort | uniq -c| sort -rn

The following screenshot shows how to use this syntax in practice:

Bash frequency of each word in file

Notice that this returns the frequency of each word in the file.

For example, we can see:

  • The word Guard occurs 5 times in the file.
  • The word East occurs 4 times in the file.
  • The word Celtics occurs 4 times in the file.
  • The word West occurs 3 times in the file.

And so on.

Related Tutorials

The following tutorials explain how to perform other common tasks in Bash:

Bash: How to Count Number of Unique Lines in File
Bash: How to Count Number of Characters in String
Bash: How to Count Number of Columns in File

Leave a Reply