Knuth-Morris-Pratt Algorithm

January 31, 2026

cs

The Knuth-Morris-Pratt (KMP) algorithm is an efficient string searching algorithm that finds occurrences of a “pattern” string within a “text” string.

The algorithm preprocesses the pattern to create a longest prefix-suffix (LPS) array, which allows the algorithm to skip unnecessary comparisons when a mismatch occurs. This array tells us, for each index in the pattern, the length of the longest proper prefix which is also a suffix.

For example, consider the pattern “ABABCABAB”. The LPS array for this pattern would be [0, 0, 1, 2, 0, 1, 2, 3, 4]. You can build the LPS array using the following code:

def build_lps(pattern: str):
    lps = [0] * len(pattern)
    j = 0  # length of current matched prefix
    for i in range(1, len(pattern)):
        while j > 0 and pattern[i] != pattern[j]:
            j = lps[j - 1]
        if pattern[i] == pattern[j]:
            j += 1
            lps[i] = j
    return lps

KMP uses LPS to decide how far to “fall back” in the pattern after a mismatch. If we’ve matched j characters and text[i] != pattern[j], we set j = lps[j-1] (instead of restarting at 0), keeping the longest prefix that could still match at this position.

def kmp_search(text: str, pattern: str):
    if pattern == "":
        # empty pattern matches everywhere
        return list(range(len(text) + 1)) 

    lps = build_lps(pattern)

    matches = []
    j = 0  # index in pattern
    for i in range(len(text)):
        while j > 0 and text[i] != pattern[j]:
            j = lps[j - 1]
        if text[i] == pattern[j]:
            j += 1
            if j == len(pattern):
                matches.append(i - len(pattern) + 1)
                j = lps[j - 1]
    return matches

The time complexity of the KMP algorithm is O(n+m)O(n + m), where nn is the length of the text and mm is the length of the pattern. The space complexity is O(m)O(m) due to the LPS array. This makes KMP significantly more efficient than the naive O(nm)O(n * m) approach, especially for large texts and patterns.