Regular Expressions

Match to a character class


The BstYI restriction enzyme cuts at the consensus sequence rGATCy, namely A or G in the first position, then GATC, and then T or C.

To find out whether a sequence contains a restriction site for BstYI, write;

if ($sequence =~ /[AG]GATC[TC]/) {...};

# This will match all of AGATCT, GGATCT, AGATCC, GGATCC.


When a list of characters is enclosed in square brackets [], one and only one of these characters must be present at the corresponding position of the string in order for the pattern to match.

You may specify a range of characters using a hyphen -.

A caret ^ at the front of the list negates the character class.


if ($string =~ /[AGTC]/) {...};           # matches any nucleotide
if ($string =~ /[a-z]/) {...};            # matches any lowercase letter
if ($string =~ /chromosome[1-6]/) {...};  # matches chromosome1, chromosome2 ... chromosome6
if ($string =~ /[^xyzXYZ]/) {...};        # matches any character except x, X, y, Y, z, Z

