Examples
========
This section provides practical examples for using torch-sla.
.. raw:: html
Quick Navigation
- Visualization:
spy() for sparsity pattern analysis
- I/O Operations: Matrix Market & SafeTensors format support
- Linear Solve: Direct & iterative solvers with gradients
- Matrix Decompositions: SVD, Eigenvalue, LU factorization
- Advanced: Nonlinear solve, distributed computing
----
Visualization
-------------
Spy Plot (Sparsity Pattern)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Visualize the sparsity pattern of a sparse matrix using the ``.spy()`` method.
**Code:**
.. code-block:: python
import torch
from torch_sla import SparseTensor
# Create a 2D Poisson matrix (5-point stencil)
n = 50
val, row, col = [], [], []
for i in range(n):
for j in range(n):
idx = i * n + j
val.append(4.0); row.append(idx); col.append(idx)
if j > 0: val.append(-1.0); row.append(idx); col.append(idx-1)
if j < n-1: val.append(-1.0); row.append(idx); col.append(idx+1)
if i > 0: val.append(-1.0); row.append(idx); col.append(idx-n)
if i < n-1: val.append(-1.0); row.append(idx); col.append(idx+n)
A = SparseTensor(torch.tensor(val), torch.tensor(row), torch.tensor(col), (n*n, n*n))
# Visualize sparsity pattern
A.spy(title="2D Poisson (5-point stencil)")
**Output Examples:**
.. list-table::
:widths: 50 50
:header-rows: 0
* - .. figure:: ../../assets/examples/spy_poisson_10x10.png
:width: 100%
:align: center
**2D Poisson (10×10)** - 100 DOF, 5-point stencil with grid lines
- .. figure:: ../../assets/examples/spy_poisson_50x50.png
:width: 100%
:align: center
**2D Poisson (50×50)** - 2,500 DOF, band structure visible
* - .. figure:: ../../assets/examples/spy_tridiag_30x30.png
:width: 100%
:align: center
**Tridiagonal (30×30)** - Classic 1D Poisson pattern
- .. figure:: ../../assets/examples/spy_random_100x100.png
:width: 100%
:align: center
**Random Sparse (100×100)** - 800 random non-zeros
Each non-zero element is rendered as a colored pixel with intensity proportional to its absolute value. Zero elements are white.
----
I/O Operations
--------------
Matrix Market Format
~~~~~~~~~~~~~~~~~~~~
Save and load sparse matrices in the standard Matrix Market (.mtx) format.
**Code:**
.. code-block:: python
from torch_sla import SparseTensor, save_matrix_market, load_matrix_market
# Create a sparse matrix
A = SparseTensor(val, row, col, (100, 100))
# Save to Matrix Market format
save_matrix_market(A, "matrix.mtx", comment="My sparse matrix")
# Load from Matrix Market format
B = load_matrix_market("matrix.mtx", device="cuda")
# Verify
assert torch.allclose(A.to_dense(), B.to_dense())
**File format (.mtx):**
::
%%MatrixMarket matrix coordinate real general
% My sparse matrix
100 100 500
1 1 4.0
1 2 -1.0
...
----
SafeTensors Format
~~~~~~~~~~~~~~~~~~
Save and load using the efficient safetensors format.
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, shape)
# Save
A.save("matrix.safetensors")
# Load
B = SparseTensor.load("matrix.safetensors", device="cuda")
# Save distributed (for multi-GPU)
A.save_distributed("matrix_dist/", num_partitions=4)
----
Basic Usage
-----------
Basic Sparse Linear Solve
~~~~~~~~~~~~~~~~~~~~~~~~~
Solve a sparse linear system :math:`Ax = b` using ``SparseTensor``.
**Linear System:**
Given a sparse matrix :math:`A \in \mathbb{R}^{n \times n}` and right-hand side :math:`b \in \mathbb{R}^n`, find :math:`x \in \mathbb{R}^n` such that:
.. math::
Ax = b \quad \Leftrightarrow \quad x = A^{-1} b
**Solver Methods:**
- **Direct solvers** (LU, Cholesky): Exact solution, :math:`O(n^{1.5})` for sparse
- **Iterative solvers** (CG, BiCGStab): Approximate solution, :math:`O(k \cdot \text{nnz})` where :math:`k` is iterations
**Problem:**
.. math::
A = \begin{pmatrix}
4 & -1 & 0 \\
-1 & 4 & -1 \\
0 & -1 & 4
\end{pmatrix}, \quad
b = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}
This is a 3×3 symmetric positive definite (SPD) tridiagonal matrix from 1D Poisson discretization.
**COO Format:**
.. list-table::
:header-rows: 1
:widths: 15 15 15 15
* - Index
- Row
- Col
- Value
* - 0
- 0
- 0
- 4.0
* - 1
- 0
- 1
- -1.0
* - 2
- 1
- 0
- -1.0
* - 3
- 1
- 1
- 4.0
* - 4
- 1
- 2
- -1.0
* - 5
- 2
- 1
- -1.0
* - 6
- 2
- 2
- 4.0
**Solution:**
.. math::
x = A^{-1}b = \begin{pmatrix} 0.4643 \\ 0.8571 \\ 0.9643 \end{pmatrix}
**Code:**
.. code-block:: python
import torch
from torch_sla import SparseTensor
# Create sparse matrix from dense (easier to read for small matrices)
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
x = A.solve(b)
----
Property Detection
~~~~~~~~~~~~~~~~~~
Detect matrix properties for optimal solver selection.
**Symmetry:** :math:`A = A^T`
**Positive Definiteness:** All eigenvalues :math:`\lambda_i > 0`
For the tridiagonal matrix: :math:`\lambda_1 \approx 2.59, \lambda_2 = 4.0, \lambda_3 \approx 5.41` (all positive → SPD)
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
# Using the tridiagonal matrix from above
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
is_sym = A.is_symmetric() # tensor(True)
is_pd = A.is_positive_definite() # tensor(True)
----
Gradient Computation
~~~~~~~~~~~~~~~~~~~~
Compute gradients through sparse solve using implicit differentiation.
**Implicit Differentiation:**
Given :math:`x = A^{-1} b`, for a loss function :math:`L(x)`, we want :math:`\frac{\partial L}{\partial A}` and :math:`\frac{\partial L}{\partial b}`.
From :math:`Ax = b`, differentiate both sides:
.. math::
dA \cdot x + A \cdot dx = db
Solving for :math:`dx`:
.. math::
dx = A^{-1}(db - dA \cdot x)
**Adjoint Method:**
Define adjoint variable :math:`\lambda = A^{-T} \frac{\partial L}{\partial x}`, then:
.. math::
\frac{\partial L}{\partial A_{ij}} = -\lambda_i \cdot x_j, \quad
\frac{\partial L}{\partial b} = \lambda
**Gradient formulas (summary):**
.. math::
\frac{\partial L}{\partial A} = -\lambda x^T, \quad
\frac{\partial L}{\partial b} = A^{-T} \frac{\partial L}{\partial x}
**Code:**
.. code-block:: python
import torch
from torch_sla import spsolve
val = torch.tensor([4.0, -1.0, -1.0, 4.0, -1.0, -1.0, 4.0],
dtype=torch.float64, requires_grad=True)
row = torch.tensor([0, 0, 1, 1, 1, 2, 2])
col = torch.tensor([0, 1, 0, 1, 2, 1, 2])
b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64, requires_grad=True)
x = spsolve(val, row, col, (3, 3), b)
loss = x.sum()
loss.backward()
# val.grad, b.grad now contain gradients
----
Specify Backend and Method
~~~~~~~~~~~~~~~~~~~~~~~~~~
Choose solver backend and method explicitly.
**Available Options:**
.. list-table::
:header-rows: 1
:widths: 15 15 40
* - Backend
- Device
- Methods
* - ``scipy``
- CPU
- ``superlu``, ``umfpack``, ``cg``, ``bicgstab``, ``gmres``
* - ``eigen``
- CPU
- ``cg``, ``bicgstab``
* - ``cusolver``
- CUDA
- ``qr``, ``cholesky``, ``lu``
* - ``cudss``
- CUDA
- ``lu``, ``cholesky``, ``ldlt``
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, (n, n))
b = torch.randn(n, dtype=torch.float64)
x1 = A.solve(b, backend='scipy', method='superlu') # Direct
x2 = A.solve(b, backend='scipy', method='cg') # Iterative (SPD)
x3 = A.solve(b, backend='scipy', method='bicgstab') # Iterative (general)
----
Matrix Operations
~~~~~~~~~~~~~~~~~
Compute norms, determinants, and eigenvalues.
**Frobenius Norm:**
.. math::
\|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2} = \sqrt{52} \approx 7.21
**Determinant:**
.. math::
\det(A) = \text{product of eigenvalues}
For the tridiagonal matrix: :math:`\det(A) = 56`
**Gradient Formula:**
.. math::
\frac{\partial \det(A)}{\partial A_{ij}} = \det(A) \cdot (A^{-1})_{ji}
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
# Using the tridiagonal matrix from above
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
norm = A.norm('fro') # ≈ 7.21
det = A.det() # 56.0 (with gradient support)
eigenvalues, eigenvectors = A.eigsh(k=2, which='LM') # Top-2 eigenvalues
----
Batched Solve
-------------
Batched SparseTensor
~~~~~~~~~~~~~~~~~~~~
Solve multiple systems with same sparsity pattern but different values.
**Problem:** Solve 4 systems with scaled matrices
.. math::
A^{(0)} = A, \quad A^{(1)} = 1.1A, \quad A^{(2)} = 1.2A, \quad A^{(3)} = 1.3A
**Code:**
.. code-block:: python
import torch
from torch_sla import SparseTensor
val = torch.tensor([4.0, -1.0, -1.0, 4.0, -1.0, -1.0, 4.0], dtype=torch.float64)
row = torch.tensor([0, 0, 1, 1, 1, 2, 2])
col = torch.tensor([0, 1, 0, 1, 2, 1, 2])
batch_size = 4
val_batch = val.unsqueeze(0).expand(batch_size, -1).clone()
for i in range(batch_size):
val_batch[i] = val * (1.0 + 0.1 * i)
A = SparseTensor(val_batch, row, col, (batch_size, 3, 3))
b = torch.randn(batch_size, 3, dtype=torch.float64)
x = A.solve(b) # x.shape: [4, 3]
----
Multi-Dimensional Batch
~~~~~~~~~~~~~~~~~~~~~~~
Handle shapes like ``[B1, B2, M, N]``.
**Example:** 2 materials × 3 temperatures = 6 systems
**Code:**
.. code-block:: python
import torch
from torch_sla import SparseTensor
B1, B2, n = 2, 3, 8
val_batch = val.unsqueeze(0).unsqueeze(0).expand(B1, B2, -1).clone()
A = SparseTensor(val_batch, row, col, (B1, B2, n, n))
b = torch.randn(B1, B2, n, dtype=torch.float64)
x = A.solve(b) # x.shape: [2, 3, 8]
----
solve_batch for Repeated Solves
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Efficient batch solve with same structure but different values.
**Use Case:** Time-stepping with fixed sparsity pattern
**LU Decomposition:** :math:`A = LU`, then solve :math:`Ly = b`, :math:`Ux = y`
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, shape)
val_batch = torch.stack([val * (1.0 + 0.01 * t) for t in range(100)]) # [100, nnz]
b_batch = torch.randn(100, n, dtype=torch.float64)
x_batch = A.solve_batch(val_batch, b_batch) # [100, n]
----
SparseTensorList
~~~~~~~~~~~~~~~~
Handle matrices with different sparsity patterns.
**Use Case:** FEM meshes with different element counts
**Code:**
.. code-block:: python
from torch_sla import SparseTensor, SparseTensorList
A1 = SparseTensor(val1, row1, col1, (5, 5))
A2 = SparseTensor(val2, row2, col2, (10, 10))
A3 = SparseTensor(val3, row3, col3, (15, 15))
matrices = SparseTensorList([A1, A2, A3])
b_list = [torch.randn(5), torch.randn(10), torch.randn(15)]
x_list = matrices.solve(b_list)
----
Distributed Solve
-----------------
Basic DSparseTensor
~~~~~~~~~~~~~~~~~~~
Create distributed sparse tensor with domain decomposition.
**Domain Decomposition:** Split 16-node grid into 2 domains
.. math::
\text{Domain 0: } \{0,...,7\}, \quad \text{Domain 1: } \{8,...,15\}
Each domain has **owned nodes** and **halo/ghost nodes** from neighbors.
**Code:**
.. code-block:: python
from torch_sla import DSparseTensor
D = DSparseTensor(val, row, col, (16, 16), num_partitions=2)
for i in range(D.num_partitions):
p = D[i]
# p.num_owned, p.num_halo, p.num_local
----
2D Poisson Example
~~~~~~~~~~~~~~~~~~
Create 2D Poisson matrix with 5-point stencil.
**Stencil:**
.. math::
\frac{1}{h^2} \begin{pmatrix} & -1 & \\ -1 & 4 & -1 \\ & -1 & \end{pmatrix}
**Code:**
.. code-block:: python
import torch
from torch_sla import DSparseTensor
def create_2d_poisson(nx, ny):
N = nx * ny
rows, cols, vals = [], [], []
for i in range(ny):
for j in range(nx):
idx = i * nx + j
rows.append(idx); cols.append(idx); vals.append(4.0)
if j > 0:
rows.append(idx); cols.append(idx - 1); vals.append(-1.0)
if j < nx - 1:
rows.append(idx); cols.append(idx + 1); vals.append(-1.0)
if i > 0:
rows.append(idx); cols.append(idx - nx); vals.append(-1.0)
if i < ny - 1:
rows.append(idx); cols.append(idx + nx); vals.append(-1.0)
return (torch.tensor(vals), torch.tensor(rows), torch.tensor(cols), (N, N))
val, row, col, shape = create_2d_poisson(4, 4)
D = DSparseTensor(val, row, col, shape, num_partitions=2)
----
Scatter and Gather
~~~~~~~~~~~~~~~~~~
Distribute global vectors to partitions and gather back.
**Diagram:**
::
Global: [x0, x1, x2, x3, x4, x5, x6, x7]
↓ scatter
Local: [x0, x1, x2, x3, x6] (P0 + halo)
[x4, x5, x6, x7, x3] (P1 + halo)
↓ gather
Global: [x0, x1, x2, x3, x4, x5, x6, x7]
**Code:**
.. code-block:: python
from torch_sla import DSparseTensor
D = DSparseTensor(val, row, col, shape, num_partitions=2)
x_global = torch.arange(16, dtype=torch.float64)
x_local = D.scatter_local(x_global)
x_gathered = D.gather_global(x_local)
----
Halo Exchange
~~~~~~~~~~~~~
Exchange ghost node values between neighboring partitions.
**References:**
- `Domain Decomposition Methods - Wikipedia `_
- `Stencil Code - Wikipedia `_
- `Halo Exchange Lecture - UIUC CS598 `_
**1D Decomposition Diagram:**
::
Partition 0: Owned [0,1,2,3], Halo [4] ← from P1
Partition 1: Owned [4,5,6,7], Halo [3] ← from P0
**Exchange Process:**
::
Before: P0=[x0,x1,x2,x3,?], P1=[x4,x5,x6,x7,?]
↓ halo_exchange_local()
After: P0=[x0,x1,x2,x3,x4], P1=[x4,x5,x6,x7,x3]
**Why needed:** For :math:`y_3 = \sum_j A_{3,j} x_j`, node 3 needs :math:`x_4` from P1.
**Code:**
.. code-block:: python
from torch_sla import DSparseTensor
D = DSparseTensor(val, row, col, shape, num_partitions=4)
x_list = [torch.randn(D[i].num_local) for i in range(D.num_partitions)]
D.halo_exchange_local(x_list)
----
Iterative Solvers
-----------------
PyTorch CG Solver
~~~~~~~~~~~~~~~~~
For large-scale problems (> 100K DOF), iterative methods are much faster than direct solvers.
**Conjugate Gradient (CG) Algorithm:**
For symmetric positive definite (SPD) matrix :math:`A`, CG minimizes:
.. math::
\phi(x) = \frac{1}{2} x^T A x - b^T x
The minimum is achieved at :math:`x^* = A^{-1} b`.
**CG Iteration:**
Starting from :math:`x_0`, with residual :math:`r_0 = b - Ax_0` and search direction :math:`p_0 = r_0`:
.. math::
\alpha_k &= \frac{r_k^T r_k}{p_k^T A p_k} \\
x_{k+1} &= x_k + \alpha_k p_k \\
r_{k+1} &= r_k - \alpha_k A p_k \\
\beta_k &= \frac{r_{k+1}^T r_{k+1}}{r_k^T r_k} \\
p_{k+1} &= r_{k+1} + \beta_k p_k
**Convergence:**
CG converges in at most :math:`n` iterations (exact arithmetic). With condition number :math:`\kappa = \lambda_{\max}/\lambda_{\min}`:
.. math::
\|x_k - x^*\|_A \leq 2 \left( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k \|x_0 - x^*\|_A
**Preconditioning:**
Preconditioned CG (PCG) solves :math:`M^{-1} A x = M^{-1} b` where :math:`M \approx A`:
- Jacobi: :math:`M = \text{diag}(A)` — simple, effective
- Incomplete Cholesky: :math:`M = \tilde{L} \tilde{L}^T` — better for ill-conditioned
**Convergence Example:**
.. figure:: ../../assets/examples/cg_convergence.png
:width: 100%
:align: center
CG convergence for 2D Poisson problems of various sizes. Larger problems require more iterations due to worse conditioning.
**Performance Comparison (1M DOF, NVIDIA H200):**
.. list-table::
:header-rows: 1
:widths: 20 15 15 20
* - Method
- Time
- Memory
- Best For
* - ``pytorch+cg``
- **0.5s** ✅
- ~500 MB
- > 100K DOF, SPD
* - ``cudss+cholesky``
- 7.8s
- ~300 MB
- < 100K DOF, high precision
**Code:**
.. code-block:: python
from torch_sla import spsolve
# For large SPD systems, use PyTorch CG
x = spsolve(val, row, col, shape, b,
backend='pytorch',
method='cg',
preconditioner='jacobi')
----
Preconditioners
~~~~~~~~~~~~~~~
Available preconditioners for iterative solvers:
.. list-table::
:header-rows: 1
:widths: 15 40 20
* - Name
- Description
- Best For
* - ``jacobi``
- Diagonal scaling (default)
- General use, fastest
* - ``ssor``
- Symmetric SOR (ω=1.5)
- Slow convergence problems
* - ``polynomial``
- Neumann series (degree=2)
- GPU-parallel
* - ``ic0``
- Incomplete Cholesky
- Very ill-conditioned
* - ``amg``
- Algebraic Multigrid
- Float32, Poisson-like
**Recommendation:**
- **Float64**: Use ``jacobi`` (simplest, fastest due to fewer iterations)
- **Float32**: Use ``amg`` (reduces iterations, compensates for precision loss)
**Code:**
.. code-block:: python
# Jacobi (default, recommended for float64)
x = spsolve(val, row, col, shape, b,
backend='pytorch', preconditioner='jacobi')
# AMG (recommended for float32)
x = spsolve(val.float(), row, col, shape, b.float(),
backend='pytorch', preconditioner='amg')
----
Mixed Precision
~~~~~~~~~~~~~~~
For memory-constrained scenarios, use mixed precision:
- Matrix stored in Float32 (memory efficient)
- Accumulation in Float64 (high precision)
**Code:**
.. code-block:: python
x = spsolve(val_f32, row, col, shape, b_f32,
backend='pytorch',
method='cg',
mixed_precision=True) # Returns float64 solution
----
CUDA Usage
----------
Move to CUDA
~~~~~~~~~~~~
Transfer to GPU for CUDA-accelerated solving.
**Performance:** cuDSS/cuSOLVER can be 10-100× faster for large systems.
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, shape)
A_cuda = A.cuda()
x = A_cuda.solve(b.cuda())
----
Backend Selection on CUDA
~~~~~~~~~~~~~~~~~~~~~~~~~
**Auto Selection:** cuDSS (preferred) → cuSOLVER (fallback)
**Code:**
.. code-block:: python
x = A_cuda.solve(b_cuda, backend='cudss', method='lu')
x = A_cuda.solve(b_cuda, backend='cudss', method='cholesky') # For SPD
x = A_cuda.solve(b_cuda, backend='cusolver', method='qr')
----
Advanced Examples
-----------------
Nonlinear Solve with Adjoint Gradients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Solve nonlinear equations :math:`F(u, \theta) = 0` with automatic differentiation using the adjoint method.
**Problem Formulation:**
Given a nonlinear residual function :math:`F: \mathbb{R}^n \times \mathbb{R}^p \to \mathbb{R}^n`, find :math:`u^*` such that:
.. math::
F(u^*, \theta) = 0
where :math:`\theta` are parameters (e.g., forcing term, material properties).
**Newton-Raphson Method:**
Starting from initial guess :math:`u_0`, iterate:
.. math::
u_{k+1} = u_k - J_F^{-1}(u_k) F(u_k)
where :math:`J_F = \frac{\partial F}{\partial u}` is the Jacobian matrix.
**Adjoint Method for Gradients:**
For a loss function :math:`L(u^*)`, the gradient w.r.t. parameters is:
.. math::
\frac{\partial L}{\partial \theta} = -\lambda^T \frac{\partial F}{\partial \theta}
where the adjoint variable :math:`\lambda` satisfies:
.. math::
J_F^T \lambda = \frac{\partial L}{\partial u}
This is memory-efficient: O(1) instead of O(iterations) graph nodes.
**Example:** Nonlinear diffusion :math:`Au + u^2 = f`
.. code-block:: python
import torch
from torch_sla import SparseTensor
# Create stiffness matrix
A = SparseTensor(val, row, col, (n, n))
# Define nonlinear residual: F(u) = Au + u² - f
def residual(u, A, f):
return A @ u + u**2 - f
# Parameters with gradients
f = torch.randn(n, requires_grad=True)
u0 = torch.zeros(n)
# Solve with Newton-Raphson
u = A.nonlinear_solve(residual, u0, f, method='newton')
# Gradients via adjoint method (memory efficient)
loss = u.sum()
loss.backward()
print(f.grad) # ∂L/∂f
**Methods:**
.. list-table::
:header-rows: 1
:widths: 15 25 30 30
* - Method
- Update Rule
- Convergence
- Best For
* - ``newton``
- :math:`u_{k+1} = u_k - J^{-1} F(u_k)`
- Quadratic (fast)
- General nonlinear
* - ``picard``
- :math:`u_{k+1} = G(u_k)` (fixed-point)
- Linear (slow)
- Mildly nonlinear
* - ``anderson``
- Accelerated fixed-point with history
- Superlinear
- Memory-constrained
----
Determinant with Gradient Support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compute determinants of sparse matrices with automatic differentiation.
**Determinant Definition:**
For a square matrix :math:`A \in \mathbb{R}^{n \times n}`, the determinant is a scalar value that encodes important matrix properties:
.. math::
\det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^{n} a_{i,\sigma(i)}
**Properties:**
- :math:`\det(AB) = \det(A) \det(B)`
- :math:`\det(A^T) = \det(A)`
- :math:`\det(A^{-1}) = 1/\det(A)`
- Matrix is singular ⟺ :math:`\det(A) = 0`
**Gradient Formula (Jacobi's Formula):**
For a differentiable loss :math:`L(\det(A))`:
.. math::
\frac{\partial \det(A)}{\partial A_{ij}} = \det(A) \cdot (A^{-1})_{ji}
This is computed efficiently using the adjoint method with :math:`O(1)` graph nodes.
**Implementation:**
- **CPU**: LU decomposition via SciPy SuperLU
- **CUDA**: Dense conversion + ``torch.linalg.det``
- **Gradient**: Adjoint method (solve :math:`A \mathbf{x} = \mathbf{e}_i` for needed columns of :math:`A^{-1}`)
**Example 1: Basic Determinant**
.. code-block:: python
import torch
from torch_sla import SparseTensor
# 3x3 tridiagonal matrix from dense
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
det = A.det() # 56.0
**Example 2: Gradient Computation**
.. code-block:: python
# Matrix with gradient tracking
dense = torch.tensor([[2.0, 1.0],
[1.0, 3.0]], dtype=torch.float64, requires_grad=True)
A = SparseTensor.from_dense(dense)
det = A.det() # 5.0
# Compute gradient
det.backward()
print(dense.grad) # [[3.0, -1.0], [-1.0, 2.0]]
**Example 3: CUDA Support**
.. code-block:: python
# Move to CUDA
A_cuda = A.cuda()
det_cuda = A_cuda.det() # Automatically uses CUDA backend
**Example 4: Batched Determinants**
.. code-block:: python
# Multiple matrices with same structure
val_batch = torch.tensor([
[2.0, 0.0, 0.0, 3.0], # det = 6
[1.0, 0.5, 0.5, 1.0], # det = 0.75
], dtype=torch.float64)
A_batch = SparseTensor(val_batch, row, col, (2, 2, 2))
det_batch = A_batch.det() # [6.0, 0.75]
**Example 5: Optimization with Determinant Constraint**
.. code-block:: python
# Optimize matrix to achieve target determinant
val = torch.tensor([1.0, 0.5, 0.5, 1.0], requires_grad=True)
target_det = torch.tensor(2.0)
optimizer = torch.optim.Adam([val], lr=0.1)
for _ in range(50):
optimizer.zero_grad()
A = SparseTensor(val, row, col, (2, 2))
loss = (A.det() - target_det) ** 2
loss.backward()
optimizer.step()
**Example 6: Distributed Matrices**
.. code-block:: python
from torch_sla import DSparseTensor
# Create distributed sparse tensor
D = DSparseTensor(val, row, col, (n, n), num_partitions=4)
# Compute determinant (gathers all partitions)
det = D.det() # Warning: requires data gather
**Numerical Considerations:**
- Determinants can overflow/underflow for large matrices
- For numerical stability, consider using log-determinant
- Singular matrices (det ≈ 0) may cause LU decomposition to fail
- Use ``torch.float64`` for better numerical precision
**Performance:**
.. list-table::
:header-rows: 1
:widths: 15 15 15 15 40
* - Matrix Size
- CPU (Sparse)
- CUDA (Dense)
- CPU-for-CUDA
- Notes
* - 10×10
- 0.3 ms
- 1.0 ms
- 0.5 ms
- CUDA 3x slower (dense overhead)
* - 100×100
- 0.3 ms
- 0.3 ms
- 0.5 ms
- Similar performance
* - 1000×1000
- 0.7 ms
- 2.5 ms
- 1.2 ms
- CUDA 3.6x slower (O(n³) vs O(n^1.5))
**⚠️ Important Performance Note:**
CUDA is **slower** than CPU for sparse determinants! This is because:
- CPU uses sparse LU decomposition: O(nnz^1.5) time, O(nnz) memory
- CUDA requires dense conversion: O(n³) time, O(n²) memory
- cuSOLVER/cuDSS don't expose sparse determinant computation
**Recommendation:** For CUDA tensors, use ``.cpu().det()`` instead of ``.det()``
----
Eigenvalue Decomposition
~~~~~~~~~~~~~~~~~~~~~~~~
Compute eigenvalues and eigenvectors of sparse matrices.
**Eigenvalue Problem:**
For a matrix :math:`A \in \mathbb{R}^{n \times n}`, find eigenvalues :math:`\lambda_i` and eigenvectors :math:`v_i` such that:
.. math::
A v_i = \lambda_i v_i
**Symmetric Case (eigsh):**
For symmetric matrices :math:`A = A^T`, eigenvalues are real and eigenvectors are orthonormal:
.. math::
A = V \Lambda V^T, \quad V^T V = I
where :math:`\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)`.
**Algorithms:**
- **ARPACK/LOBPCG**: Iterative methods for sparse matrices, compute top-k eigenvalues
- **Shift-invert**: For interior eigenvalues
**Gradient Formula:**
For a simple eigenvalue :math:`\lambda_i` with eigenvector :math:`v_i`:
.. math::
\frac{\partial \lambda_i}{\partial A_{jk}} = v_i[j] \cdot v_i[k]
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, (n, n))
# Largest eigenvalues (ARPACK/LOBPCG)
eigenvalues, eigenvectors = A.eigsh(k=6, which='LM')
# Smallest eigenvalues
eigenvalues, eigenvectors = A.eigsh(k=6, which='SM')
# For non-symmetric matrices
eigenvalues, eigenvectors = A.eigs(k=6)
**Example Output:**
.. figure:: ../../assets/examples/eigenvalue_spectrum.png
:width: 100%
:align: center
Eigenvalue spectrum of 1D Laplacian (n=50). Red points show the 6 smallest eigenvalues computed by ``eigsh()``.
**Gradient support:** Eigenvalue decomposition is differentiable!
.. code-block:: python
val = val.requires_grad_(True)
A = SparseTensor(val, row, col, shape)
eigenvalues, _ = A.eigsh(k=3)
loss = eigenvalues.sum()
loss.backward() # Gradients flow to val
----
SVD (Singular Value Decomposition)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compute truncated SVD for sparse matrices.
**SVD Definition:**
For a matrix :math:`A \in \mathbb{R}^{m \times n}`, the SVD is:
.. math::
A = U \Sigma V^T
where:
- :math:`U \in \mathbb{R}^{m \times r}`: Left singular vectors (orthonormal columns)
- :math:`\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r)`: Singular values (:math:`\sigma_1 \geq \sigma_2 \geq \ldots \geq 0`)
- :math:`V \in \mathbb{R}^{n \times r}`: Right singular vectors (orthonormal columns)
**Truncated SVD (rank-k approximation):**
.. math::
A_k = U_k \Sigma_k V_k^T = \sum_{i=1}^{k} \sigma_i u_i v_i^T
This is the best rank-k approximation in Frobenius norm (Eckart-Young theorem):
.. math::
\|A - A_k\|_F = \sqrt{\sum_{i=k+1}^{r} \sigma_i^2}
**Relation to Eigenvalues:**
- :math:`\sigma_i^2` are eigenvalues of :math:`A^T A` (or :math:`A A^T`)
- :math:`v_i` are eigenvectors of :math:`A^T A`
- :math:`u_i` are eigenvectors of :math:`A A^T`
**Applications:**
- **Dimensionality reduction**: PCA via SVD
- **Low-rank approximation**: Matrix compression
- **Pseudoinverse**: :math:`A^+ = V \Sigma^{-1} U^T`
**Example Output:**
.. figure:: ../../assets/examples/svd_lowrank.png
:width: 100%
:align: center
Left: Singular value spectrum showing rapid decay after true rank. Right: Approximation error decreases as rank increases.
**Code:**
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, (m, n))
# Compute top-k singular values/vectors
U, S, Vt = A.svd(k=10)
# Low-rank approximation
A_approx = U @ torch.diag(S) @ Vt
# Relative approximation error
error = (A.to_dense() - A_approx).norm() / A.norm('fro')
----
LU Factorization for Repeated Solves
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cache LU factorization for efficient repeated solves with the same matrix.
**LU Decomposition:**
For a matrix :math:`A`, find lower triangular :math:`L` and upper triangular :math:`U` such that:
.. math::
PA = LU
where :math:`P` is a permutation matrix (for numerical stability).
**Solving with LU:**
To solve :math:`Ax = b`:
1. Factorize once: :math:`PA = LU` — Cost: :math:`O(n^3)` or :math:`O(\text{nnz}^{1.5})` for sparse
2. Forward substitution: :math:`Ly = Pb` — Cost: :math:`O(n^2)` or :math:`O(\text{nnz})` for sparse
3. Back substitution: :math:`Ux = y` — Cost: :math:`O(n^2)` or :math:`O(\text{nnz})` for sparse
**Complexity Savings:**
For :math:`k` solves with same matrix:
- Without caching: :math:`O(k \cdot n^{1.5})` (sparse direct)
- With LU caching: :math:`O(n^{1.5} + k \cdot n)` — up to :math:`\sqrt{n}` faster!
**Use Case:** Time-stepping with fixed stiffness matrix
.. code-block:: python
from torch_sla import SparseTensor
A = SparseTensor(val, row, col, shape)
# Factorize once (expensive)
lu = A.lu()
# Solve multiple RHS efficiently (cheap)
for t in range(100):
b_t = compute_rhs(t)
x_t = lu.solve(b_t) # Fast solve using cached LU
----
Graph Neural Network Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use torch-sla for graph Laplacian operations in GNNs.
**Code:**
.. code-block:: python
import torch
from torch_sla import SparseTensor
# Create adjacency matrix from edge list
edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]])
edge_weight = torch.ones(4)
A = SparseTensor(edge_weight, edge_index[0], edge_index[1], (3, 3))
# Compute degree matrix
D = A.sum(dim=1)
# Normalized Laplacian: L = I - D^(-1/2) A D^(-1/2)
D_inv_sqrt = D.pow(-0.5)
A_norm = A * D_inv_sqrt.unsqueeze(1) * D_inv_sqrt.unsqueeze(0)
L = SparseTensor.eye(3) - A_norm
# Solve Laplacian system
x = L.solve(b)
----
Jupyter Notebook Examples
-------------------------
Interactive examples are available as Jupyter notebooks in the ``examples/`` directory:
.. list-table::
:widths: 30 70
:header-rows: 1
* - Notebook
- Description
* - ``basic_usage.ipynb``
- Basic solve, property detection, visualization
* - ``batched_solve.ipynb``
- Batched operations and SparseTensorList
* - ``determinant.py``
- Determinant computation with gradient support (CPU & CUDA)
* - ``gcn_example.ipynb``
- Graph neural network with sparse Laplacian
* - ``nonlinear_solve.ipynb``
- Nonlinear equations with adjoint gradients
* - ``visualization.ipynb``
- Spy plots and sparsity visualization
* - ``persistence.ipynb``
- Save/load with safetensors and Matrix Market
* - ``suitesparse_demo.ipynb``
- Loading matrices from `SuiteSparse Collection `_
* - ``distributed/``
- Distributed computing examples (matvec, solve, eigsh)