.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "basic_usage/example_submatrices.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_basic_usage_example_submatrices.py: Sub-matrices of linear operators ================================ This tutorial explains how to create linear operators that correspond to a sub-matrix of another linear operator. Specifically, given the linear operator :code:`A`, we are interested in constructing the linear operator that corresponds to its sub-matrix :code:`A[row_idxs, :][:, col_idxs]`, where :code:`row_idxs` contains the sub-matrix's row indices, and :code:`col_idxs` contains the sub-matrix's column indices. First, the imports. .. GENERATED FROM PYTHON SOURCE LINES 15-32 .. code-block:: Python from time import time from typing import List import numpy import torch from torch import nn from curvlinops import HessianLinearOperator from curvlinops.examples.functorch import functorch_hessian from curvlinops.examples.utils import report_nonclose from curvlinops.submatrix import SubmatrixLinearOperator # make deterministic torch.manual_seed(0) numpy.random.seed(0) .. GENERATED FROM PYTHON SOURCE LINES 33-37 Setup ----- Let's create some toy data, a small MLP, and use mean-squared error as loss function. .. GENERATED FROM PYTHON SOURCE LINES 38-61 .. code-block:: Python N = 4 D_in = 7 D_hidden = 5 D_out = 3 DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") X1, y1 = torch.rand(N, D_in).to(DEVICE), torch.rand(N, D_out).to(DEVICE) X2, y2 = torch.rand(N, D_in).to(DEVICE), torch.rand(N, D_out).to(DEVICE) data = [(X1, y1), (X2, y2)] model = nn.Sequential( nn.Linear(D_in, D_hidden), nn.ReLU(), nn.Linear(D_hidden, D_hidden), nn.Sigmoid(), nn.Linear(D_hidden, D_out), ).to(DEVICE) params = [p for p in model.parameters() if p.requires_grad] loss_function = nn.MSELoss(reduction="mean").to(DEVICE) .. GENERATED FROM PYTHON SOURCE LINES 62-64 We will investigate the Hessian. To make sure our results are correct, let's keep a Hessian matrix computed via :mod:`functorch` around. .. GENERATED FROM PYTHON SOURCE LINES 65-70 .. code-block:: Python H_functorch = ( functorch_hessian(model, loss_function, params, data).detach().cpu().numpy() ) .. GENERATED FROM PYTHON SOURCE LINES 71-74 Here is the corresponding linear operator and a quick check that builds up its matrix representation through multiplication with the identity matrix, followed by comparison to the Hessian matrix computed via :mod:`functorch`. .. GENERATED FROM PYTHON SOURCE LINES 75-80 .. code-block:: Python H = HessianLinearOperator(model, loss_function, params, data) report_nonclose(H_functorch, H @ numpy.eye(H.shape[1])) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/curvlinops/envs/1.2.0/lib/python3.8/site-packages/curvlinops/_base.py:259: UserWarning: Input vector is float64, while linear operator is float32. Converting to float32. warn( Compared arrays match. .. GENERATED FROM PYTHON SOURCE LINES 81-88 Diagonal blocks --------------- The Hessian consists of blocks :code:`(i, j)` that contain the second-order derivatives of the loss w.r.t. the parameters in :code:`(params[i], params[j])`. Let's define a function to extract these blocks from the Hessian: .. GENERATED FROM PYTHON SOURCE LINES 89-112 .. code-block:: Python def extract_block( mat: numpy.ndarray, params: List[torch.Tensor], i: int, j: int ) -> numpy.ndarray: """Extract the Hessian block from parameters ``i`` and ``j``. Args: mat: The matrix with block structure. params: The parameters defining the blocks. i: Row index of the block to be extracted. j: Column index of the block to be extracted. Returns: Block ``(i, j)``. Has shape ``[params[i].numel(), params[j].numel()]``. """ param_dims = [p.numel() for p in params] row_start, row_end = sum(param_dims[:i]), sum(param_dims[: i + 1]) col_start, col_end = sum(param_dims[:j]), sum(param_dims[: j + 1]) return mat[row_start:row_end, :][:, col_start:col_end] .. GENERATED FROM PYTHON SOURCE LINES 113-115 As an example, let's extract the block that corresponds to the Hessian w.r.t. the first layer's weights in our model. .. GENERATED FROM PYTHON SOURCE LINES 116-120 .. code-block:: Python i, j = 0, 0 H_param0_functorch = extract_block(H_functorch, params, i, j) .. GENERATED FROM PYTHON SOURCE LINES 121-123 We can build a linear operator for this sub-Hessian by only providing the first layer's weight as parameter: .. GENERATED FROM PYTHON SOURCE LINES 124-127 .. code-block:: Python H_param0 = HessianLinearOperator(model, loss_function, [params[i]], data) .. GENERATED FROM PYTHON SOURCE LINES 128-133 Like this we can get blocks from the diagonal. Let's check that this linear operator works as expected by multiplying it onto the identity matrix and comparing the result to the block we extracted from our ground truth: .. GENERATED FROM PYTHON SOURCE LINES 134-137 .. code-block:: Python report_nonclose(H_param0_functorch, H_param0 @ numpy.eye(params[i].numel())) .. rst-class:: sphx-glr-script-out .. code-block:: none Compared arrays match. .. GENERATED FROM PYTHON SOURCE LINES 138-154 Now you might be wondering if we can also build up linear operators for off-diagonal blocks. These blocks contain mixed second-order derivatives and are not Hessians anymore. For instance, such a block is rectangular in general, and thus non-symmetric. Since we are not asking for a Hessian anymore, we cannot use the interface of :class:`HessianLinearOperator`. Luckily, there is a different way to achieve this. Off-diagonal blocks ------------------- As an example,let's try to extract the Hessian block from the first and second parameters in our network (i.e. the weights and biases in the first layer). For that we need to slice the Hessian differently along its rows and columns. We can use the :class:`curvlinops.SubmatrixLinearOperator` class for that: .. GENERATED FROM PYTHON SOURCE LINES 155-166 .. code-block:: Python param_dims = [p.numel() for p in params] i, j = 0, 1 row_start, row_end = sum(param_dims[:i]), sum(param_dims[: i + 1]) col_start, col_end = sum(param_dims[:j]), sum(param_dims[: j + 1]) row_idxs = list(range(row_start, row_end)) # keep the following row indices col_idxs = list(range(col_start, col_end)) # keep the following column indices H_param0_param1 = SubmatrixLinearOperator(H, row_idxs, col_idxs) .. GENERATED FROM PYTHON SOURCE LINES 167-169 As the following test shows, this linear operator indeed represents the desired rectangular Hessian block: .. GENERATED FROM PYTHON SOURCE LINES 170-178 .. code-block:: Python H_param0_param1_functorch = extract_block(H_functorch, params, i, j) report_nonclose( H_param0_param1_functorch, H_param0_param1_functorch @ numpy.eye(param_dims[j]), ) .. rst-class:: sphx-glr-script-out .. code-block:: none Compared arrays match. .. GENERATED FROM PYTHON SOURCE LINES 179-188 Arbitrary sub-matrices ---------------------- So far, we were constrained to blocks spanned by parameter tensors rather than arbitrary elements. As the name :class:`SubmatrixLinearOperator` suggests, we can use it to create arbitrary sub-matrices. As an example, let's say we want to keep rows :code:`[0, 13, 42]` of the Hessian, and columns :code:`[1, 2, 3]`. This works as follows: .. GENERATED FROM PYTHON SOURCE LINES 189-196 .. code-block:: Python row_idxs = [0, 13, 42] # keep the following row indices col_idxs = [1, 2, 3] # keep the following column indices H_sub = SubmatrixLinearOperator(H, row_idxs, col_idxs) H_sub_functorch = H_functorch[row_idxs, :][:, col_idxs] .. GENERATED FROM PYTHON SOURCE LINES 197-198 Quick check to see if it worked: .. GENERATED FROM PYTHON SOURCE LINES 199-202 .. code-block:: Python report_nonclose(H_sub_functorch, H_sub @ numpy.eye(len(col_idxs))) .. rst-class:: sphx-glr-script-out .. code-block:: none Compared arrays match. .. GENERATED FROM PYTHON SOURCE LINES 203-210 Looks good. Performance remarks ---------------------- By the way, using this interface, we could have also constructed the first parameter's Hessian as follows: .. GENERATED FROM PYTHON SOURCE LINES 211-223 .. code-block:: Python i, j = 0, 0 row_start, row_end = sum(param_dims[:i]), sum(param_dims[: i + 1]) col_start, col_end = sum(param_dims[:j]), sum(param_dims[: j + 1]) row_idxs = list(range(row_start, row_end)) col_idxs = list(range(col_start, col_end)) H_param0_alternative = SubmatrixLinearOperator(H, row_idxs, col_idxs) report_nonclose(H_param0_functorch, H_param0_alternative @ numpy.eye(param_dims[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none Compared arrays match. .. GENERATED FROM PYTHON SOURCE LINES 224-232 In general though, it is a good idea to first reduce the linear operator's size as much as possible (in our case, by restricting the parameters to the necessary ones using the :code:`params` argument in :class:`HessianLinearOperator`) and apply slicing afterwards to save computations. In our example, the matrix-vector product of :code:`H_param0` should therefore be faster than that of :code:`H_param0_alternative`: .. GENERATED FROM PYTHON SOURCE LINES 233-248 .. code-block:: Python x = numpy.random.rand(param_dims[0]) # less computations start = time() H_param0 @ x end = time() print(f"H_param0.matvec: {end - start:.2e} s") # more computations start = time() H_param0_alternative @ x end = time() print(f"H_param0_alternative.matvec: {end - start:.2e} s") .. rst-class:: sphx-glr-script-out .. code-block:: none H_param0.matvec: 1.08e-03 s H_param0_alternative.matvec: 1.93e-03 s .. GENERATED FROM PYTHON SOURCE LINES 249-250 That's all for now. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.297 seconds) .. _sphx_glr_download_basic_usage_example_submatrices.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: example_submatrices.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: example_submatrices.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: example_submatrices.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_