Bash: How to Extract Text Between Two Strings


You can use the following basic syntax to extract text between two strings using Bash:

echo $original_string | grep -oP '(?<=string1).*(?=string2)'

This particular example extracts the text between string1 and string2 within the string variable named original_string.

Note that we used the grep command with the operator -oP to only return the matching part of the string after string1 (not including string1) and before string2 (not including string2).

The following example shows how to use this syntax in practice.

Example: How to Extract Text Between Two Strings in Bash

Suppose that we have the following string:

  • His ID Number:004593Employee

And suppose that we would like to extract only the text between the colon ( : ) and the string Employee to get the following value:

  • 004593

We can use the following syntax to do so:

$original_string="His ID Number:004593Employee"

echo $original_string | grep -oP '(?<=:).*(?=Employee)'

The following screenshot shows how to use this syntax in practice:

Bash extract text between two strings

Notice that we’re able to extract only the 004593 from the string, which represents all of the text between the colon ( : ) and Employee.

Occasionally you may want to extract the text between specific strings several times.

For example, suppose that we have the following string:

  • The ID Values are #004593Employee and #002321Employee

And suppose that we would like to extract only the number for each employee ID to get the following:

  • 004593
  • 002321

We can use the following syntax to do so:

$original_string="The ID Values are #004593Employee and #002321Employee"

echo $original_string | grep -oP '(?<=#).*?(?=Employee)'

The following screenshot shows how to use this syntax in practice:

Bash extract multiple occurrences of text between two strings

Notice that we’re able to extract only the number for each employee ID to get the following:

  • 004593
  • 002321

By specifying .*? in between the two strings we’re able to specify that we would like to perform a non-greedy match, which allows us to extract all occurrences of strings between the # symbol and the string Employee.

Related Tutorials

The following tutorials explain how to perform other common tasks in Bash:

Bash: How to Replace Dash with Underscore
Bash: How to Replace Multiple Characters in String
Bash: How to Replace All Occurrences of String in File

Leave a Reply