---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.13.6
kernelspec:
  display_name: Python 3 (math-583)
  language: python
  name: math-583
---

```{code-cell}
:tags: [hide-cell]

import mmf_setup; mmf_setup.nbinit()
import os
from pathlib import Path
import logging
logging.getLogger("matplotlib").setLevel(logging.CRITICAL)
logging.getLogger('numba').setLevel(logging.CRITICAL)
logging.getLogger('OMP').setLevel(logging.CRITICAL)
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
```

(sec:Solution9)=
# Solution 9: Newton's Method

Here is one way to try to find an attractive cycle.  Let's start with the original cubic
polynomial $g(z) = z^3 - 1$ and move the root at $r_1 = 1$ to a new root at $c$, which
will be our "parameter":
:::{margin}
Note that $g''(z)$ has a root at the geometric mean $z_0$.  This is important for
Fatou's theorem that states that, if $f(z)$ has an attracting cycle, then at least one
solution to $f'(z) = 0$ will fall into it.
:::
\begin{gather*}
   g(z) = \frac{(z^3 - 1)(z-c)}{z-1} = \underbrace{(z^2 + z + 1)}_{(z-q)(z-\bar{q})}(z-c),\\
   g'(z) = 3z^2 + (2z + 1)(1-c), \\
   g''(z) = 6z + 2(1-c) = 6\Bigl(z - \underbrace{\frac{c-1}{3}}_{z_0}\Bigr).
\end{gather*}
Our Newton iteration is
\begin{gather*}
  f(z) = \frac{2z^3 + z^2(1-c) + c}
              {3z^2 + (2z + 1)(1-c)},\qquad
  z_0 = \frac{q + q^* + c}{3} = \frac{c-1}{3},\\
  f'(z) = \frac{\overbrace{(z^2 + z + 1)(z - c)}^{g(z)}\overbrace{2(3z + 1 - c)}^{g''(z)}}
               {(3z^2 + (2z+1)(1-c))^2}
\end{gather*}

We first simply start from $z_0$ and iterate a bunch of times, plotting the final
point.  If this state is attracted to one of the roots, then after a large number of
iterations, $g(z_n) \approx 0$.  But if $z_0$ is attracted to a periodic cycle, $z_n$
will not be a root, and so we should see some deviations:

```{code-cell}
:tags: [hide-input]
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

def g(z, c, d=0):
    if d == 0:
        return (z**2 + z + 1)*(z-c)
    else:
        return (2*z + 1)*(z-c) + z**2 + z + 1

def f(z, c):
    return z - g(z, c=c)/g(z, c, d=1)

def h(c, N=1000):
    z0 = (c-1)/3
    z = z0
    for n in range(N):
        z = f(z, c=c)
    return z
    
N = 1000
c = np.linspace(-3, 4, 1000)
fig, ax = plt.subplots(figsize=(7, 4))
with np.errstate(divide='ignore', invalid='ignore'):
    # Suppress some floating point errors.
    ax.plot(c, g(h(c, N=N), c))
ax.set(ylim=(-10, 1), xlabel="$c = r_0$", ylabel=f'$g(z_{{{N}}})$');

ax1 = ax.inset_axes([0.15, 0.1, 0.4, 0.3], xlim=(3.3, 3.6), ylim=(-9, -5))
c = np.linspace(3.3, 3.6, 1000)
with np.errstate(divide='ignore', invalid='ignore'):
    # Suppress some floating point errors.
    ax1.plot(c, g(h(c, N=N), c))
ax.indicate_inset_zoom(ax1, edgecolor="green");

ax2 = ax.inset_axes([0.15, 0.55, 0.4, 0.3], xlim=(2.03, 2.095), ylim=(-6.2, -5.2))
c = np.linspace(2.03, 2.095, 1000)
with np.errstate(divide='ignore', invalid='ignore'):
    # Suppress some floating point errors.
    ax2.plot(c, g(h(c, N=N), c))
ax.indicate_inset_zoom(ax2, edgecolor="red");
```
We see that there are several places where we appear to have an attractive cycle.  We
have zoomed in on two such places near $r_0=2.06$ and $r_0=3.5$.  If you look carefully
at these insets, you might notice that it appears we are tracing part of some
{ref}`sec:BifurcationDiagrams`.  This is a pretty good hint that there is a
Mandelbrot-like set hidden somewhere nearby.

Here we try to find it

```{code-cell}
:tags: [hide-input]

%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
import numba

# Since we want to pass in the roots as an array, we must generalize and use
# guvectorize, which means we must write our own loop.  Note also that even scalar
# arguments must be arrays
@numba.guvectorize(
    ["(complex128[:,:], complex128[:], float64[:], int_[:], int32[:,:])"],
    "(m,n),(i),(),()->(m,n)",
    cache=True,
    target="parallel",
)
def _convergence_time(c, roots, eps, maxiter, n):
    for x in range(c.shape[0]):
        for y in range(c.shape[1]):
            roots[0] = c[x, y]
            z = roots.mean()
            for n[x, y] in range(maxiter[0] + 1):
                g = 1 + 0j
                tmp = 0j
                for r_i in roots:
                    g *= (z-r_i)
                    tmp += 1/(z-r_i)
                z -= 1/tmp
                if g.real**2 + g.imag**2 <= eps[0]:
                    break

def convergence_time(c, roots, eps=1e-8, maxiter=100):
    """Return the "convergence time" of initial point `z0` under Newton's method.

    Arguments
    ---------
    c : complex
        Location in the Mandelbrot set.
    roots : complex array
        Roots of the polynomial.
    eps : float
        Bailout radius.
    maxiter : int
        Maximum number of iteration.  If the iteration does not terminate before this,
        then this value is returned.  It can be used as an approximation of the interior
        of the sets.
    """
    return _convergence_time(c, roots, eps, maxiter)


# Try changing the ranges here
N = 256

roots = np.exp(2j*np.pi * (np.arange(3)/3.0))
c, R = 1.578 + 0j, 0.01
c, R = 1.1395 + 0j, 0.001
c, R = 3.4 + 0j, 0.2

x = np.linspace(c.real - R, c.real + R, N)
y = np.linspace(c.imag - R, c.imag + R, N)
z = x[:, None] + 1j*y[None, :]

# Try changing maxiter - especially if you get deep.
%time n = convergence_time(z, roots, maxiter=1000)

fig, ax = plt.subplots()
mesh = ax.pcolormesh(x, y, np.log10(n).T, shading='auto')
ax.plot(roots.real, roots.imag, 'ow')
ax.plot([c.real], [c.imag], 'xw')

fig.colorbar(mesh, label=r"Convergence time $n$")
ax.set(xlabel=r"$x = \Re z$", ylabel=r"$y = \Im z$", 
       xlim=(x.min(), x.max()), ylim=(y.min(), y.max()),
       title=f"Newton convergence time for $g(z)=z^3-1$.",  aspect=1);
```

Here we zoom out.  Note: this is a rotated and scaled version of the set shown in the
[3Blue1Brown video](https://youtu.be/LqbZpur38nw&t=1156s).  They look at the region on
the left side of this figure.

```{code-cell}
:tags: [hide-input]

# Try changing the ranges here
N = 1024

roots = np.exp(2j*np.pi * (np.arange(3)/3.0))
c, R = 1.578 + 0j, 0.01
c, R = 1.1395 + 0j, 0.001
c, R = 3.4 + 0j, 0.2
c, R = 1 + 0j, 3

x = np.linspace(c.real - R, c.real + R, N)
y = np.linspace(c.imag - R, c.imag + R, N)
z = x[:, None] + 1j*y[None, :]

# Try changing maxiter - especially if you get deep.
%time n = convergence_time(z, roots, maxiter=1000)

fig, ax = plt.subplots()
mesh = ax.pcolormesh(x, y, np.log10(n).T, shading='auto')
ax.plot(roots.real, roots.imag, 'ow')
ax.plot([c.real], [c.imag], 'xw')

fig.colorbar(mesh, label=r"Convergence time $n$")
ax.set(xlabel=r"$x = \Re z$", ylabel=r"$y = \Im z$", 
       xlim=(x.min(), x.max()), ylim=(y.min(), y.max()),
       title=f"\"Mandelbrot set\" for $g(z)=(z^2+z+1)(z-c)$.",  aspect=1);
```

## Alternative Approach

:::{margin}
For further discussions and additional references, see Fig. 45 of `cite`{Peitgen:1986} Eq. (6.16).
:::
The qualitative picture will not change if we perform an affine transformation of the
plane.  I.e rotations, translations, and scalings.  Thus, given an arbitrary cubic
polynomial, we can always translate, rotate, and scale so that the mean of the roots
$z_0 = 0$ lies at the origin, and so that one of the roots lies at $r_0 = 1$.  The two
other roots must satisfy $r_1 + r_2 = -1$.  Thus, we have
\begin{gather*}
  g(z) = (z-1)(z-r_1)(z-r_2) 
       = z^3 - \underbrace{(1+r_1+r_2)}_{0}z^2
         + (\underbrace{r_1r_2}_{\lambda} + \underbrace{r_1 + r_2}_{-1})z 
         - \underbrace{r_1r_2}_{\lambda}\\
       = z^3 + (\lambda - 1)z - \lambda,\\
  g'(z) = 3z^2 + \lambda - 1, \qquad
  g''(z) = 6z,\qquad
  f(z) = \frac{2z^3 + \lambda}{3z^2 + \lambda - 1}.
\end{gather*}

```{code-cell}
:tags: [hide-input]
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
import numba
plt.rcParams['figure.figsize'] = (4, 3)

@numba.vectorize(
    ["int32(complex128, complex128, float64, int_)"],
    cache=True,
    target="parallel",
)
def _convergence_time3(z0, lam, eps, maxiter):
    z = z0
    for n in range(maxiter + 1):
        g = z**3 + (lam-1)*z - lam
        z = (2*z**3 + lam)/(3*z**2 + lam - 1)
        if g.real**2 + g.imag**2 <= eps:
            break
    return n

def convergence_time3(z0, lam, eps=1e-8, maxiter=100):
    return _convergence_time3(z0, lam, eps, maxiter)

# Try changing the ranges here
N = 1024
lam0 = -0.5 + 0j
R = 2
x = np.linspace(lam0.real - R, lam0.real + R, N)
y = np.linspace(lam0.imag - R, lam0.imag + R, N)
z = x[:, None] + 1j*y[None, :]

# Try changing maxiter - especially if you get deep.
%time n = convergence_time3(z0=0, lam=z, maxiter=100)

fig, ax = plt.subplots()
mesh = ax.pcolormesh(x, y, np.log10(n).T, shading='auto')
fig.colorbar(mesh, label=r"Convergence time $\log_{10}n$")
ax.set(xlabel=r"$x = \Re z$", ylabel=r"$y = \Im z$", 
       title=f"\"Mandelbrot set\" for $g(z)=z^3 + (\lambda - 1) - \lambda$.",  aspect=1);
```

```{code-cell}
:tags: [hide-input, margin]

w = np.linspace(-4, 2, 500)
z = w*np.exp(w)
fig, ax = plt.subplots()
ax.plot(z, w, label="$W_{-1}(z)$")
w = np.linspace(-1, 2, 500)
z = w*np.exp(w)
ax.plot(z, w, lw=4, label="$W_{0}(z)$")
ax.legend()
ax.grid()
ax.set(ylim=(-4, 2), xlim=(-1, 6),
       xlabel="$z=we^w$", ylabel="$w=W_{k}(z)$", title="Lambert $W$ function");
```

## Lambert $W$ Function

Here we explore trying to find a globally convergent method for computing the Lambert
$W_0(z)$:
\begin{gather*}
  z = we^w, \qquad W_0(z) = w, \qquad z\geq -\frac{1}{e}, \qquad w\geq -1.
\end{gather*}
Let's start with the following Newton iteration
\begin{gather*}
  g_z(w) = we^{w} - z, \qquad
  g_z'(w) = (1+w)e^{w}, \\
  w\mapsto f_z(w) = w - \frac{g_z(w)}{g_z'(w)}
                  = \frac{w^2 + ze^{-w}}{1+w}.
\end{gather*}
```{code-cell}
:tags: [hide-input]

def g(w, z, d=0):
    if d == 0:
        return w*np.exp(w) - z
    elif d == 1:
        return (1+w)*np.exp(w)
    
def f(w, z):
    return (w**2 + z*np.exp(-w))/(1+w)
```

```{code-cell}
:tags: [hide-input, margin]

fig, ax = plt.subplots()

for z in [-np.exp(-1), -0.1, 0, 1, 2]:
    w = np.linspace(-4, 4, 200)
    label = "$-e^{-1}$" if np.allclose(z, -np.exp(-1)) else f"${z=}$"
    ax.plot(w, f(w, z), label=label)
ax.plot(w, w, 'k')
ax.grid('on')
ax.legend()
ax.set(xlim=(-4, 4), ylim=(-20, 5), xlabel="$w$", ylabel=f"$f_{{{z=}}}(w)$");
```
We show what $w \mapsto f_z(w)$ looks like on the right.  Due to the pole at $w=-1$, we
expect that we must start with $w_0 \geq -1$ if we want to converge to the $k=0$
branch.  

Now lets see what the region of convergence is for $z = 1$ (lower figure on the right).
Modulo some floating point issues close to $w_0=-1$ (where the denominator vanishes) and
$z=-e^{-1}$ (where we rely on a subtle cancellation to get $w=-1$), we see that all
initial guesses converge to the desired root.

```{code-cell}
:tags: [hide-input, margin]

z = 1
w0 = np.linspace(-1, 2)
N = 100

fig, ax = plt.subplots()

for z in [-np.exp(-1)+1e-8, -0.1, 0, 1, 2]:
    label = "$-e^{-1}$" if np.allclose(z, -np.exp(-1)) else f"${z=}$"
    w = w0
    for n in range(N):
        w = f(w, z=z)

    ax.plot(w0, w, label=label)
    ax.legend()

ax.set(xlabel="$w_0$", ylabel=f"$w_{{{N=}}}$");
```

We will need to tidy up the region near $w=-1$, but away from this, we can now see how
long it takes for Newton's method to converge to a desired accuracy.

```{code-cell}
@numba.vectorize("int32(float64, float64, int_, float_)", cache=True, target="parallel")
def _escape_time(z, w, maxiter, eps):
    for n in range(maxiter):
        w = (w**2 + z*np.exp(-w))/(1+w)
        g = w*np.exp(w) - z
        if abs(g) < eps:
            break
    return n

def escape_time(z, w0, maxiter=100, eps=1e-12):
    return _escape_time(z, w0, maxiter, eps)
    
z = np.linspace(-np.exp(-1), 3)[1:]
w0 = np.linspace(-1, 2, 101)[1:]

n = escape_time(z[:, None], w0[None, :], 10)

fig, ax = plt.subplots()
mesh = ax.pcolormesh(z, w0, n.T)
ax.set(xlabel="$z$", ylabel="$w_{0}$");
fig.colorbar(mesh, label=r"Convergence time $n$");
```

We see that if we can get close-enough to the solution for an initial guess, then we can
converge in a few iterations.  Thus an initial guess of $w_0 = 1$ will probably work
quite well for $z\in [0, 5]$.  For larger and smaller $z$ we will need to do something
else.

How to proceed?  One can try to make series approximations or look at the asymptotic
behavior of the function to get an initial guess.  For example, we might expand about $w
= -1 + \epsilon$:
\begin{gather*}
    z = (\epsilon -1)e^{\epsilon - 1} 
      = \frac{\epsilon - 1}{e}\left(1 + \epsilon + \frac{\epsilon^2}{2} + \cdots\right).
\end{gather*}
Keeping terms to quadratic order, we have
\begin{gather*}
    z \approx \frac{\epsilon^2 - 1}{e}, \qquad
    \epsilon \approx \sqrt{1 + ez}, \qquad
    w \approx \sqrt{1 + ez} - 1.
\end{gather*}
Let's see how well using $w_0 = \sqrt{1 + e z} - 1$ converges.

```{code-cell}
@np.vectorize
def nW1(z, maxiter=12, eps=1e-12):
    w = np.sqrt(1 + np.exp(1)*z) - 1
    for n in range(maxiter):
        w = (w**2 + z*np.exp(-w))/(1+w)
        g = w*np.exp(w) - z
        if abs(g) < eps:
            break
    return n

@np.vectorize
def nW2(z, maxiter=12, eps=1e-12):
    w = 1
    for n in range(maxiter):
        w = (w**2 + z*np.exp(-w))/(1+w)
        g = w*np.exp(w) - z
        if abs(g) < eps:
            break
    return n

z = np.linspace(-np.exp(-1), 5, 1000)[1:]

fig, ax = plt.subplots()
ax.plot(z, nW1(z), label=r"$w_0 = \sqrt{1+ez}-1$")
ax.plot(z, nW2(z), label=r"$w_0 = 1$")
ax.legend()
ax.set(xlabel="$z$", ylabel="convergence time $n$");
```

This tells us that a reasonable strategy might be to switch guesses at $z=1$.  We still
have an issue for larger $z$.  Some ideas for improving this.

1. Asymptotic expansions.  Try a series expansion in $1/z$ or $1/w$.  Unfortunately, in
   this case, the asymptotic behaviour is defined by the Lambert function, which is one
   of the reasons it is useful to have a solution.  So this does not help here.
2. Choose different $w_0$ for larger $z$.  Perhaps one can guess a reasonable form?
   This is likely not going to help for very large $z$, but might get a practical
   solution if you know you will only need solutions for e.g. $z<100$.
3. We directly applied Newton's method to $g(z) = we^w - z$, but it can be applied to
   many different forms of this equation.  Some of these might converge better for
   large $z$.  For example:
   \begin{gather*}
     g(w) = e^w - z/w,\\
     g(w) = w + \ln w - \ln z,\\
     \dots.
   \end{gather*}


## References

[double delta-function potential]: <https://en.wikipedia.org/wiki/Delta_potential#Double_delta_potential>
[Lambert $W$ function]: <https://en.wikipedia.org/wiki/Lambert_W_function>
[Newton's method]: <https://en.wikipedia.org/wiki/Newton's_method>