Solving Recurrence Relations

Posted on May 9, 2024 by John Mount in Data science | 0 Comments

This article was first published on python – Win Vector LLC , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Introduction

A neat bit of “engineering mathematics” is solving recurrence relations. The solution method falls out of the notation itself, and harkens back to a time where formal sums were often used in place of vector subscript notation.

Unfortunately the variety of such solutions is small enough to allow teaching by memorization. In this note I try to avoid memorization, and motivate the methodology. I feel this is facilitated by separating a number of often smeared together representations (formulas, sequences, vectors, linear checks, characteristic polynomial, and polynomial check families) into distinct realizations. We are also going to emphasize calculating and confirming claims in Python.

The problem

A simple form of the recurrence problem is to write down a general solution for a subscripted family of linear equations such as the following

Fn+2 = Fn+1 + Fn

where n is a subscript varying over all positive integers.

Such a relation or equation can arise in number of situations or applications:

Time series analysis.
Estimating run times of algorithms.
Combinatorics.

The above example is in fact the Fibonacci sequence.

The question is: if we are given initial conditions F1 = 1 and F2 = 1, what is Fn (for general non-negative integer n)?

In this case the Wikipedia solution is Fn = (r1n - r2n)/sqrt(5) where:

r1 = (1 + sqrt(5))/2
r2 = (1 - sqrt(5))/2

Natural questions include:

Why is the solution in this form?
How do you find r1 and r2?

We will set up some notation and then solve a few examples.

The Solution

Vector space notation

Let’s formalize our notation a bit.

First let’s settle on working over the vector space of all infinite sequences of real numbers with non-negative subscripts. This is just saying we consider the infinite sequence F = (F1, F2, F3, ...) we are solving for as one of many possible sequences. We can use “R[Z+]” as the group-ring style naming of this vector space. Now consider any such vector v that obeys the recurrence equations:

vn+2 = vn+1 + vn for n = 1, 2, ...

Let “S” denote the subset of R[Z+] that obey all of the above linear recurrence checks. We claim a few things about S (the set of solutions to our current example system):

The all zeroes vector is in S. So S is non-empty.
If v obeys all of the above constraints then so does c v for any constant c.
If u and v obey all of the above constraints then so does u + v.
By induction, all vn are completely determined by the values v1 and v2.

The first three observations tell us S is a vector subspace of R[Z+]. The fourth observation tells us this vector subspace is 2 dimensional, so any solution can be written as the linear combination of two basis solutions.

For notational convenience we will associate functions with vectors in R[Z+]. For a function f() let [f(n) | n = 1, 2, ...] denote the graph or vector (f(1), f(2), f(3), ...). In particular we are interested in the vectors [r1n | n = 1, 2, ...] = (r1, r12, r13, ...) and [r2n | n = 1, 2, ...] = (r2, r22, r23, ...) (for the previously defined r1 and r2). We claim [r1n | n = 1, 2, ...] and [r2n | n = 1, 2, ...] both obey the above linear check conditions.

Inspecting the claimed Fibonacci solution

Let’s confirm both [r1n | n = 1, 2, ...] and [r2n | n = 1, 2, ...] both obey the linear check conditions. In my opinion one doesn’t truly learn the math without working at least a few concrete examples.

In[1]:

# define [r1**n | n = 1, 2, ...] = f_r_1
import numpy as np

r_1 = (1 + np.sqrt(5))/2

f_r_1 = np.asarray([r_1**n for n in range(1, 11)])

f_r_1

Out[1]:

array([  1.61803399,   2.61803399,   4.23606798,   6.85410197,
        11.09016994,  17.94427191,  29.03444185,  46.97871376,
        76.01315562, 122.99186938])

In[2]:

# check f_r_1 obeys specified  Fibonacci relations
assert np.all([
    np.abs(f_r_1[n+2] - (f_r_1[n+1] + f_r_1[n])) < 1e-8
    for n in range(2, len(f_r_1) - 2)
])

In[3]:

# define [r2**n | n = 1, 2, ...] = f_r_2

r_2 = (1 - np.sqrt(5))/2

f_r_2 = np.asarray([r_2**n for n in range(1, 11)])

f_r_2

Out[3]:

array([-0.61803399,  0.38196601, -0.23606798,  0.14589803, -0.09016994,
        0.05572809, -0.03444185,  0.02128624, -0.01315562,  0.00813062])

In[4]:

# check f_r_2 obeys specified  Fibonacci relations
assert np.all([
    np.abs(f_r_2[n+2] - (f_r_2[n+1] + f_r_2[n])) < 1e-8
    for n in range(2, len(f_r_2) - 2)
])

In[5]:

# generate a bit of the Fibonacci sequence using 
# the definitional recurrence F[n+2] = F[n+1] + F[n]
f = [1, 1]
for i in range(8):
    n = len(f) - 2
    f.append(f[n + 1] + f[n])

f

Out[5]:

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

In[6]:

# Wikipedia claimed solution
prediction = (f_r_1 - f_r_2) / np.sqrt(5)

prediction

Out[6]:

array([ 1.,  1.,  2.,  3.,  5.,  8., 13., 21., 34., 55.])

In[7]:

# confirm claimed combination matches Fibonacci sequence
assert np.max(np.abs(
        np.asarray(f) - prediction
    )) < 1e-8

In[8]:

# we can also solve for the mixture coefficients by
# treating our sequences as column vectors and asking
# a solver how to write the vector f as a linear
# combination of the vectors f_r_1 and f_r_2.
#
# As a linear system in the first 2 terms this looks like:
#
# f[0] = soln[0] * f_r_1[0]  +  soln[1] * f_r_1[0]
# f[1] = soln[0] * f_r_1[1]  +  soln[1] * f_r_1[1]
#
# And such a linear system is translated into 
# matrix notation as follows:
soln = np.linalg.solve(
    np.asarray([
        [f_r_1[0], f_r_2[0]],
        [f_r_1[1], f_r_2[1]],
    ]),
    [
        f[0],
        f[1],
    ]
)

soln

Out[8]:

array([ 0.4472136, -0.4472136])

In[9]:

# our (identical) derived solution
prediction2 = (
    soln[0] * f_r_1 
    + soln[1] * f_r_2
)

prediction2

Out[9]:

array([ 1.,  1.,  2.,  3.,  5.,  8., 13., 21., 34., 55.])

In[10]:

# confirm claimed combination matches Fibonacci sequence
assert np.max(np.abs(
        np.asarray(f) - prediction2
    )) < 1e-8

The neat trick

The core of the solution follows from a neat trick: replace the subscripts with superscripts. This is very powerful trick. Let’s see that in action.

We gamble and hope some of the solutions are of the following simple form: [rn | n = 1, 2, ...] = (r, r2, r3, ...), where r is a to be determined (possibly complex) number.

Our claim is: if the number r is a solution to the polynomial equation (in x)

x2 = x + 1

then v = [rn | n = 1, 2, ...] satisfies

vn+2 = vn+1 + vn for n = 1, 2, ...

.
The polynomial is called “the characteristic polynomial” of the linear recurrence checks.

The “trick” to this is: if x = r is a root of, or satisfies, x2 = x + 1, then it also satisfies xn+2 = xn+1 + xn (which is equivalent to xn x2 = xn x + xn 1) for all n ≥ 0 . So rn+2 = rn+1 + rn for all n ≥ 0 which is exactly the claim v = [rn | n = 1, 2, ...] is a solution to all of the subscripted v-checks. It pays to think of the characteristic polynomial p(x) as shorthand for the family of “check polynomials” xn p(x) for n = 0, 1, 2, .... However, for some problems we will need to appeal directly to the family of check polynomials.

In our case, the roots, or solutions, to this polynomial equation are the roots x2 - x - 1 = 0. By the quadratic formula:

r1 = (1 + sqrt(5))/2
r2 = (1 - sqrt(5))/2

This gives us two linearly independent solutions to the recurrence check equations: [r1n | n = 1, 2, ...] and [r2n | n = 1, 2, ...]. These solutions are enough to form a basis for the solution space S, so we know F is some linear combination of [r1n | n = 1, 2, ...] and [r2n | n = 1, 2, ...]. And we have already showed how to find the linear combination by setting up linear equations to match the first two entries of the vector.

A harder example

Suppose we want to solve the recurrence:

Wn+2 = 6 Wn+1 - 9 Wn for n = 1, 2, ...

The above are the “W checks” we now want to satisfy. A solution to these is a specific vector of values we can substitute in for the W‘s, such that none of the equations are false.

We will try the earlier solution strategy. We are then interested in roots of the corresponding characteristic polynomial:

x2 - 6 x + 9 = 0

The above polynomial factors into (x - 3)2. So r1 = r2 = 3. So we know [3n | n = 1, 2, ...] is a solution to the W checks.

The space of solutions is again 2 dimensional. So to parameterize over all the possible solutions, we need a second linear independent solution. The question then is: how do we find a second linear independent solution?

Dealing with repeated roots

When our characteristic polynomial p(x) has repeated roots, the characteristic polynomial is not expressive enough to represent some solutions. However, the corresponding family of check polynomials xn p(x) for n = 0, 1, ... are rich enough to find the extra solutions. We will use that when a polynomial p(x) has a repeated root (that is: for some number r, the fact that (x-r)2 divides into p(x) with no remainder), then p(x) shares that root with p'(x) (where p'(x) is the derivative of p(x) with respect to x).

Take the earlier “W check” polynomials p(x) = xn+2 - 6 xn+1 + 9 xn. Define the polynomial y(x) = x p'(x) = (n+2) xn+2 - 6 (n+1) xn+1 + 9 n xn. y(x) itself is (by inspection) the check polynomials corresponding to the following (new) linear recurrence checks:

(n+2) Yn+2 = 6 (n+1) Yn+1 - 9 (n) Yn for n = 1, 2, ...

As y(3) = 0 (true because 3 is a double root of p(x)) we know [3n | n = 1, 2, ...] is a solution to the new Y linear recurrence checks. Substitute this valid Y solution [3n | n = 1, 2, ...] into the Y checks to derive the following family of (true or valid) equations.

((n+2) 3n+2) = 6 ((n+1) 3n+1) - 9 ((n) 3n) for n = 1, 2, ...

Notice the above is saying [n 3n | n = 1, 2, ...] obeys the earlier W linear recurrence problem. We have our desired additional solution.

Confirming the last claim

Frankly, this sort of argument doesn’t fully sink in until one confirms some examples and calculations. The derivation is a bit of “proof by change of notation”, which is never very satisfying.

So: let’s confirm some calculations claimed in the example to try to build some familiarity with the items discussed.

In[11]:

# show down p(x) zeros out at x=3
r = 3
evals_p = [
    r**(n+2) - 6 * r**(n+1) + 9 * r**n 
    for n in range(10)]

evals_p

Out[11]:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In[12]:

assert np.all(np.abs(evals_p)) < 1e-8

In[13]:

# show down y(x) = x p'(x) zeros out at x=3
r = 3
evals_y = [
    r * ((n+2) * r**(n+1) - 6 * (n+1) * r**(n) + 9 * n * r**(n-1)) 
    for n in range(10)]

evals_y

Out[13]:

[0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In[14]:

assert np.all(np.abs(evals_y)) < 1e-8

In[15]:

# write down the first p = [3**n | n = 1, 2, ...] solution
p = np.asarray([
    r**n for n in range(1, 11)
    ])

p

Out[15]:

array([    3,     9,    27,    81,   243,   729,  2187,  6561, 19683,
       59049])

In[16]:

# confirm W checks on p'
assert np.all([
    p[n+2] - 6 * p[n+1] + 9 * p[n] == 0 
    for n in range(1, len(p) - 2)])

In[17]:

# confirm Y checks on p
# this works as W is the "Y" checks for p'
assert np.all([
    (n+2) * p[n+2] - 6 * (n+1) * p[n+1] + 9 * n * p[n] == 0 
    for n in range(1, len(p) - 2)])

Back to the example

As we now have the required number of linearly independent solutions (2 solutions: [3n | n = 1, 2, ...] and [n 3n | n = 1, 2, ...]), we can solve for any specified initial conditions (as demonstrated earlier).

Believe it or not, we are done.

In general

The general solution strategy is as follows.

For a general homogeneous linear recurrence of the form:

vn+k = ck-1 vn+k-1 + ... + c0 vn

Move to the characteristic polynomial equation:

xk = ck-1 xk-1 + ... + c0

We can generate a basis for all solutions as v = [ nk rn | n = 1, 2, ...] where r is a root of the characteristic polynomial, and k is a non-negative integer smaller than the degree of multiplicity of the root r. Once we have enough linear independent solutions, we can write any other solution as a linear combination of what we have.

We call all of the above the “swap subscripts (general and powerful) to powers (specific and weak)” strategy (though there are obviously a few more steps and details to the method).

Conclusion

Our solution strategy was exchanging powerful subscripts (which can implement any integer keyed lookup table) with less powerful superscripts (that can only express powers). We can lift any solution found in the weaker world (power series) to the more general one (subscripted sequences). We don’t always find enough power series solutions, and in that case we transform the problem to find more solutions to modified polynomials. The trick is to track the details of how the transforms operate on both our vector solutions and check polynomials.

The above system is general, it can solve a lot of problems. We concentrated on calculating over vector values. Related methods include the umbral calculus, the study of shift operators, and the Laplace transform.

(The source code for this worksheet can be found here)

To leave a comment for the author, please follow the link and comment on their blog: python – Win Vector LLC .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers