Longest Common Subsequence

Longest Common Subsequence

Hello and welcome to day 6 of the daily Leetcode series! I'm surprised I've been able to keep this up for almost a week.

Today, we're going to solve yet another dynamic programming problem (Leetcode needs to calm down with these). Finding the longest common subsequence is one of the classic dynamic programming problems on strings.

Let's take a look at the problem now.

Understanding The Problem

Here's the problem description as given on Leetcode:

Given two strings text1 and text2, return the length of their longest common subsequence*. If there is no *common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

  • For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both strings.

The description is pretty clear as to what we have to compute. As always, make sure you read the problem at least two to three times before proceeding to solve it.

As always, look for the two clues that indicate that a given problem has to be solved using dynamic programming.

  1. We need to find the maximum length of the common subsequence. That's the first clue.

  2. Secondly, we get a choice for each character in the string, whether to include it or not.

Both these clues point to a dynamic programming solution to this problem. I'll be discussing three solutions to this problem:

  1. Recursion

  2. Recursion with memoization

  3. Top-Down approach

Recursion

The first step to writing a recursive solution is to figure out the base condition, that is, the condition that, if true, will cause the recursive function to return early.

Here, the recursive function will have the two input strings and two integers representing the size of each string as parameters. The function definition would look like this:

int lcs(string x,string y,int n,int m);

Now that we know the parameters, can you think about what the base condition might be?

If you're ever confused about how to find the base condition for any recursive function, here's a little trick that might help. Think about the smallest valid input that could be given to the problem. That itself becomes the base condition.

For example, in this problem, the smallest valid input would be an empty string. Say I give you two strings:

string a = "abcde";

string b = ""

What would be the length of the longest common subsequence of a and b?

If you're thinking 0, congratulations you've discovered the base condition for this problem.

Next, let's think about how to decide whether or not to include a particular character in the subsequence. There are two possibilities for each character of the two strings, either they match, or they don't. Now, if they do match, we need to include these two characters in the subsequence, otherwise, we don't.

Keep in mind the difference between a subsequence and a substring. A subsequence, unlike a substring, doesn't need to be continuous.

So, when we find two matching characters, we include them in our subsequence by adding one to our answer. What would the second term of this return statement be? Once we find two matching characters at say indices i and j respectively, we need to call the same function but with the size of the arrays reduced by one.

The return statement for the character match condition will look something like this:

if(text1[n-1] == text2[m-1]){
    return 1 + lcs(text1,text2,n-1,m-1);
}

So, we start from the end of both strings, comparing each character. We've seen what happens when the characters match, what about when they don't match?

In this case, we need to consider two possibilities. We either consider the next character in text1 or text2 . Look at the following code snippet and see if it makes sense.

else{
    return max(lcs(text1,text2,n,m-1),lcs(text1,text2,n-1,m));
}

Since we need the maximum length of the common subsequence, we choose the maximum of these two values.

Putting everything together, the final code looks like this:

int lcs(string text1, string text2, int n,int m){
        if(n == 0 or m == 0) return 0;

        if(text1[n-1] == text2[m-1]){
            return 1 + lcs(text1,text2,n-1,m-1);
        }
        else{
            return max(lcs(text1,text2,n,m-1),lcs(text1,text2,n-1,m));
        }
}
int longestCommonSubsequence(string text1, string text2) {
    int n = text1.size(), m = text2.size();    
    return lcs(text1,text2,n,m);
}

That's it for the recursive solution! But, this problem gives rise to the overlapping subproblems condition. This recursive function will end up re-calculating the answer for the same input multiple times leading to increased time complexity.

To solve this problem, we store the results of each input in a 2D matrix, that is, we memoize the solution.

Recursion with Memoization

Adding just a few lines of code can reduce the time complexity of the recursive solution considerably.

Go through the following code:

    vector<vector<int>> dp;
    int lcs(string x,string y,int n,int m){
        if(n == 0 or m == 0) return 0;
        if(dp[n][m] != -1) return dp[n][m];

        if(x[n-1] == y[m-1])
            return dp[n][m] = lcs(x,y,n-1,m-1) + 1;
        else
            return dp[n][m] = max(lcs(x,y,n-1,m),lcs(x,y,n,m-1));
    }
    int longestCommonSubsequence(string text1, string text2) {
        int n = text1.size(), m = text2.size();
        dp.resize(n+1,vector<int>(m+1,-1));
        return lcs(text1,text2,n,m);
    }

There's not much that's different here from the previous approach. We use a 2D matrix to store the values returned by the recursive function. The (i,j)th cell of the matrix represents the longest common subsequence for the two strings with sizes i and j respectively.

Top-Down Approach

We can solve this problem without recursion as well. We'll still need to use a 2D matrix to store the intermediate results but instead of recursion, we will be using two loops to fill up the matrix. The length of the longest common subsequence will be stored in the last cell of the matrix.

The intuition behind the problem remains the same. We use the same code used in the recursive function, but just inside two nested loops. Take a look at the code below:

    int longestCommonSubsequence(string x,string y){
        int n = x.size(),m = y.size();
        vector<vector<int>> dp(n+1,vector<int>(m+1,-1));

        for(int i = 0;i < n+1; i++){
            for(int j = 0; j < m+1; j++){
                if(i == 0 or j == 0) 
                {
                    dp[i][j] = 0;
                    continue;
                }
                if(x[i-1] == y[j-1]){
                    dp[i][j] = dp[i-1][j-1] + 1;
                }
                else{
                    dp[i][j] = max(dp[i-1][j],dp[i][j-1]);
                }
            }
        }
        return dp[n][m];
    }

That's it for today! Hope you understood the problem and had fun solving it. Be sure to come back tomorrow for the solution to the next problem!

Cheers!