Pattern matching algorithms

Pattern matching is one of the most common tasks we perform on a day-to-day basis. PHP has built-in support for regular expression, and mostly, we rely on the regular expression and built-in string functions to solve our regular needs for such problems. PHP has a readymade function named strops, which returns the position of the first occurrence of the string in a text. Since it only returns the position of the first occurrence, we can try to write a function that will return all possible positions. We will explore the brute-force approach first, where we will check each of the character for the actual string with each one of the pattern string. Here is the function that will do the job for us:

function strFindAll(string $pattern, string $txt): array { 
$M = strlen($pattern);
$N = strlen($txt);
$positions = [];

for ($i = 0; $i <= $N - $M; $i++) {
for ($j = 0; $j < $M; $j++)
if ($txt[$i + $j] != $pattern[$j])
break;

if ($j == $M)
$positions[] = $i;
}

return $positions;
}

The approach is very straightforward. We start from position 0 of the actual string and keep on going until the $N-$M position, where $M is the length of the pattern we are looking for. We do not need to search the full string even at the worst case where there is no match for the pattern. Now, let's call the function with some arguments:

$txt = "AABAACAADAABABBBAABAA"; 
$pattern = "AABA";
$matches = strFindAll($pattern, $txt);

if ($matches) {
foreach ($matches as $pos) {
echo "Pattern found at index : " . $pos . " ";
}
}

This will produce following output:

Pattern found at index : 0
Pattern found at index : 9
Pattern found at index : 16

If we look at our $txt string, we can find that there are tree occurrences of our pattern AABA. The first one is at the beginning, second one is at the center, and third one is close to the end of the string. The algorithm we have written will take O((N - M) * M) complexity, where N is the length of the text and M is the length of the pattern we are searching for. If we want, we can improve the efficiency of such matching using a popular algorithm known as Knuth-Morris-Pratt (KMP) string-matching algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset