.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "basic_usage/example_eigenvalues.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_basic_usage_example_eigenvalues.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_basic_usage_example_eigenvalues.py:

Eigenvalues
===============

This example demonstrates how to compute a subset of eigenvalues of a linear
operator, using :func:`scipy.sparse.linalg.eigsh`. Concretely, we will compute
leading eigenvalues of the Hessian.

As always, imports go first.

.. GENERATED FROM PYTHON SOURCE LINES 10-27

.. code-block:: Python


    from contextlib import redirect_stderr
    from io import StringIO

    import numpy
    import scipy
    import torch
    from torch import nn

    from curvlinops import HessianLinearOperator
    from curvlinops.examples.functorch import functorch_hessian
    from curvlinops.utils import allclose_report

    # make deterministic
    torch.manual_seed(0)
    numpy.random.seed(0)


.. GENERATED FROM PYTHON SOURCE LINES 28-33

Setup
-----

We will use synthetic data, consisting of two mini-batches, a small MLP, and
mean-squared error as loss function.

.. GENERATED FROM PYTHON SOURCE LINES 34-57

.. code-block:: Python


    N = 20
    D_in = 7
    D_hidden = 5
    D_out = 3

    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    X1, y1 = torch.rand(N, D_in).to(DEVICE), torch.rand(N, D_out).to(DEVICE)
    X2, y2 = torch.rand(N, D_in).to(DEVICE), torch.rand(N, D_out).to(DEVICE)

    model = nn.Sequential(
        nn.Linear(D_in, D_hidden),
        nn.ReLU(),
        nn.Linear(D_hidden, D_hidden),
        nn.Sigmoid(),
        nn.Linear(D_hidden, D_out),
    ).to(DEVICE)
    params = {n: p for n, p in model.named_parameters() if p.requires_grad}

    loss_function = nn.MSELoss(reduction="mean").to(DEVICE)


.. GENERATED FROM PYTHON SOURCE LINES 58-62

Linear operator
------------------

We are ready to setup the linear operator. In this example, we will use the Hessian.

.. GENERATED FROM PYTHON SOURCE LINES 63-67

.. code-block:: Python


    data = [(X1, y1), (X2, y2)]
    H = HessianLinearOperator(model, loss_function, params, data).to_scipy()


.. GENERATED FROM PYTHON SOURCE LINES 68-73

Leading eigenvalues
-------------------

Through :func:`scipy.sparse.linalg.eigsh`, we can obtain the leading
:math:`k=3` eigenvalues.

.. GENERATED FROM PYTHON SOURCE LINES 74-81

.. code-block:: Python


    k = 3
    which = "LM"  # largest magnitude
    top_k_evals, _ = scipy.sparse.linalg.eigsh(H, k=k, which=which)

    print(f"Leading {k} Hessian eigenvalues: {top_k_evals}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Leading 3 Hessian eigenvalues: [1.41091264 1.42817086 1.49430477]


.. GENERATED FROM PYTHON SOURCE LINES 82-88

Verifying results
-----------------

To double-check this result, let's compute the Hessian with
:code:`functorch`, compute all its eigenvalues with
:func:`scipy.linalg.eigh`, then extract the top :math:`k`.

.. GENERATED FROM PYTHON SOURCE LINES 89-96

.. code-block:: Python


    H_functorch = functorch_hessian(model, loss_function, params, data).detach()
    evals_functorch, _ = torch.linalg.eigh(H_functorch)
    top_k_evals_functorch = evals_functorch[-k:]

    print(f"Leading {k} Hessian eigenvalues (functorch): {top_k_evals_functorch}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Leading 3 Hessian eigenvalues (functorch): tensor([1.4109, 1.4282, 1.4943])


.. GENERATED FROM PYTHON SOURCE LINES 97-98

Both results should match.

.. GENERATED FROM PYTHON SOURCE LINES 99-103

.. code-block:: Python


    print(f"Comparing leading {k} Hessian eigenvalues (linear operator vs. functorch).")
    assert allclose_report(top_k_evals, top_k_evals_functorch.double(), rtol=1e-4)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Comparing leading 3 Hessian eigenvalues (linear operator vs. functorch).


.. GENERATED FROM PYTHON SOURCE LINES 104-107

:func:`scipy.sparse.linalg.eigsh` can also compute other subsets of
eigenvalues, and also their associated eigenvectors. Check out its
documentation for more!

.. GENERATED FROM PYTHON SOURCE LINES 111-119

Power iteration versus ``eigsh``
--------------------------------

Here, we compare the query efficiency of :func:`scipy.sparse.linalg.eigsh` with the
`power iteration <https://en.wikipedia.org/wiki/Power_iteration>`_ method, a simple
method to compute the leading eigenvalues (in terms of magnitude). We re-use the im-
plementation from the `PyHessian library <https://github.com/amirgholami/PyHessian>`_
and adapt it to work with SciPy arrays rather than PyTorch tensors:

.. GENERATED FROM PYTHON SOURCE LINES 120-185

.. code-block:: Python


    def power_method(
        A: scipy.sparse.linalg.LinearOperator,
        max_iterations: int = 100,
        tol: float = 1e-3,
        k: int = 1,
    ) -> tuple[numpy.ndarray, numpy.ndarray]:
        """Compute the top-k eigenpairs of a linear operator using power iteration.

        Code modified from PyHessian, see
        https://github.com/amirgholami/PyHessian/blob/72e5f0a0d06142387fccdab2226b4c6bae088202/pyhessian/hessian.py#L111-L156

        Args:
            A: Linear operator of dimension ``D`` whose top eigenpairs will be computed.
            max_iterations: Maximum number of iterations. Defaults to ``100``.
            tol: Relative tolerance between two consecutive iterations that has to be
                reached for convergence. Defaults to ``1e-3``.
            k: Number of eigenpairs to compute. Defaults to ``1``.

        Returns:
            The eigenvalues as array of shape ``[k]`` in descending order, and their
            corresponding eigenvectors as array of shape ``[D, k]``.
        """
        eigenvalues = []
        eigenvectors = []

        def normalize(v: numpy.ndarray) -> numpy.ndarray:
            return v / numpy.linalg.norm(v)

        def orthonormalize(v: numpy.ndarray, basis: list[numpy.ndarray]) -> numpy.ndarray:
            for basis_vector in basis:
                v -= numpy.dot(v, basis_vector) * basis_vector
            return normalize(v)

        computed_dim = 0
        while computed_dim < k:
            eigenvalue = None
            v = normalize(numpy.random.randn(A.shape[0]))

            for _ in range(max_iterations):
                v = orthonormalize(v, eigenvectors)
                Av = A @ v

                tmp_eigenvalue = v.dot(Av)
                v = normalize(Av)

                if eigenvalue is None:
                    eigenvalue = tmp_eigenvalue
                elif abs(eigenvalue - tmp_eigenvalue) / (abs(eigenvalue) + 1e-6) < tol:
                    break
                else:
                    eigenvalue = tmp_eigenvalue

            eigenvalues.append(eigenvalue)
            eigenvectors.append(v)
            computed_dim += 1

        # sort in ascending order and convert into arrays
        eigenvalues = numpy.array(eigenvalues[::-1])
        eigenvectors = numpy.array(eigenvectors[::-1])

        return eigenvalues, eigenvectors


.. GENERATED FROM PYTHON SOURCE LINES 186-190

Let's compute the top-3 eigenvalues via power iteration and verify they roughly match.
Note that we are using a smaller :code:`tol` value than the PyHessian default value
here to get better convergence, and we have to use relatively large tolerances for the
comparison (which we didn't do when comparing :code:`eigsh` with :code:`eigh`).

.. GENERATED FROM PYTHON SOURCE LINES 191-198

.. code-block:: Python


    top_k_evals_power, _ = power_method(H, tol=1e-4, k=k)
    print(f"Comparing leading {k} Hessian eigenvalues (eigsh vs. power).")
    assert allclose_report(
        top_k_evals_functorch.double(), top_k_evals_power, rtol=2e-2, atol=1e-6
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Comparing leading 3 Hessian eigenvalues (eigsh vs. power).


.. GENERATED FROM PYTHON SOURCE LINES 199-203

This indicates that the power method achieves poorer accuracy than :code:`eigsh`. But
does it therefore require fewer matrix-vector products? To answer this, let's turn on
the linear operator's progress bar, which allows us to count the number of
matrix-vector products invoked by both eigen-solvers:

.. GENERATED FROM PYTHON SOURCE LINES 204-227

.. code-block:: Python


    H = HessianLinearOperator(
        model, loss_function, params, data, progressbar=True
    ).to_scipy()

    # determine number of matrix-vector products used by `eigsh`
    with StringIO() as buf, redirect_stderr(buf):
        top_k_evals, _ = scipy.sparse.linalg.eigsh(H, k=k, which=which)
        # The tqdm progressbar will print "matmat" for each batch in a matrix-vector
        # product. Therefore, we need to divide by the number of batches
        queries_eigsh = buf.getvalue().count("matmat") // len(data)
    print(f"eigsh used {queries_eigsh} matrix-vector products.")

    # determine number of matrix-vector products used by power iteration
    with StringIO() as buf, redirect_stderr(buf):
        top_k_evals_power, _ = power_method(H, k=k, tol=1e-4)
        # The tqdm progressbar will print "matmat" for each batch in a matrix-vector
        # product. Therefore, we need to divide by the number of batches
        queries_power = buf.getvalue().count("matmat") // len(data)
    print(f"Power iteration used {queries_power} matrix-vector products.")

    assert queries_power > queries_eigsh


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    HessianLinearOperator.data_statistics (on cpu):   0%|          | 0/2 [00:00<?, ?it/s]    HessianLinearOperator.data_statistics (on cpu): 100%|██████████| 2/2 [00:00<00:00, 29330.80it/s]
    HessianLinearOperator.batch_prediction_loss_gradient (on cpu):   0%|          | 0/2 [00:00<?, ?it/s]
    HessianLinearOperator.batch_prediction_loss_gradient (on cpu):   0%|          | 0/2 [00:00<?, ?it/s]    HessianLinearOperator.batch_prediction_loss_gradient (on cpu): 100%|██████████| 2/2 [00:00<00:00, 508.31it/s]
    HessianLinearOperator.batch_prediction_loss_gradient (on cpu):  50%|█████     | 1/2 [00:00<00:00, 336.11it/s]
    HessianLinearOperator._matmat (on cpu):   0%|          | 0/2 [00:00<?, ?it/s]    HessianLinearOperator._matmat (on cpu): 100%|██████████| 2/2 [00:00<00:00, 221.12it/s]
    HessianLinearOperator._matmat (on cpu):   0%|          | 0/2 [00:00<?, ?it/s]    HessianLinearOperator._matmat (on cpu): 100%|██████████| 2/2 [00:00<00:00, 221.82it/s]
    eigsh used 21 matrix-vector products.
    Power iteration used 72 matrix-vector products.


.. GENERATED FROM PYTHON SOURCE LINES 228-235

Sadly, the power iteration also does not offer computational benefits, consuming
more matrix-vector products than :code:`eigsh`. While it is elegant and simple,
it cannot compete with :code:`eigsh`, at least in the comparison provided here.

Therefore, we recommend using :code:`eigsh` for computing eigenvalues. This method
becomes accessible because :code:`curvlinops` interfaces with SciPy's linear
operators.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.806 seconds)


.. _sphx_glr_download_basic_usage_example_eigenvalues.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: example_eigenvalues.ipynb <example_eigenvalues.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: example_eigenvalues.py <example_eigenvalues.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: example_eigenvalues.zip <example_eigenvalues.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_