Implementing Knuth-Morris-Pratt algorithm

Knuth-Morris-Pratt (KMP) string-matching algorithm is very similar to the naive algorithm we just implemented. The basic difference is that the KMP algorithm uses information from the partial matches and takes a decision to stop matching on any mismatch. It can also precompute the locations where the pattern can exist so that we can reduce the number of repeating comparison or false checks. The KMP algorithm pre-computes a table that helps during the search operation and increases efficiency. While implementing KMP algorithm, we need to computer the Longest Proper Prefix Suffix (LPS). Let's check the function to generate the LPS part:

function ComputeLPS(string $pattern, array &$lps) { 
$len = 0;
$i = 1;
$M = strlen($pattern);

$lps[0] = 0;

while ($i < $M) {
if ($pattern[$i] == $pattern[$len]) {
$len++;
$lps[$i] = $len;
$i++;
} else {
if ($len != 0) {
$len = $lps[$len - 1];
} else {
$lps[$i] = 0;
$i++;
}
}
}
}

For our pattern from the previous example AABA, the LPS will be [0,1,0,1]; now, let's write the KMP implementation for our string/pattern search problem:

function KMPStringMatching(string $str, string $pattern): array { 
$matches = [];
$M = strlen($pattern);
$N = strlen($str);
$i = $j = 0;
$lps = [];

ComputeLPS($pattern, $lps);

while ($i < $N) {
if ($pattern[$j] == $str[$i]) {
$j++;
$i++;
}

if ($j == $M) {
array_push($matches, $i - $j);
$j = $lps[$j - 1];
} else if ($i < $N && $pattern[$j] != $str[$i]) {
if ($j != 0)
$j = $lps[$j - 1];
else
$i = $i + 1;
}
}
return $matches;
}

The preceding code is the implementation of the KMP algorithm. Now, let's run the following example with our implemented algorithm:

$txt = "AABAACAADAABABBBAABAA"; 
$pattern = "AABA";
$matches = KMPStringMatching($txt, $pattern);

if ($matches) {
foreach ($matches as $pos) {
echo "Pattern found at index : " . $pos . " ";
}
}

This will produce the following output:

Pattern found at index : 0
Pattern found at index : 9
Pattern found at index : 16

The complexity of the KMP algorithm is O(N + M), which is much better than regular pattern matching. Here, O (M) is for computing LPS and O (N) for KMP algorithm itself.

There are many detailed descriptions of the KMP algorithm that can be found online.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset