.. torch-sla documentation master file
.. meta::
:description: torch-sla: PyTorch Sparse Linear Algebra library. GPU-accelerated sparse solvers with autograd. Works with torch.sparse tensors. pip install torch-sla.
:keywords: torch sparse, torch sparse matrix, torch sparse tensor, pytorch sparse, pytorch sparse matrix, pytorch sparse solver, sparse linear algebra, torch.sparse, GPU sparse solver, CUDA sparse, cuSOLVER, cuDSS, differentiable sparse, autograd sparse, scipy sparse pytorch, spsolve pytorch, FEM pytorch
:google-site-verification:
:robots: index, follow
.. image:: _static/logo.jpg
:alt: torch-sla - PyTorch Sparse Linear Algebra with GPU Acceleration
:align: center
:width: 300px
torch-sla: PyTorch Sparse Linear Algebra
========================================
.. raw:: html
torch-sla (Torch Sparse Linear Algebra) is a memory-efficient, differentiable sparse linear equation solver library for PyTorch with multiple backends. Perfect for scientific computing, FEM, CFD, and machine learning applications requiring sparse matrix operations with automatic differentiation.
.. raw:: html
Why torch-sla?
--------------
.. raw:: html
- 🚀 High Performance: CUDA-accelerated solvers via cuSOLVER and cuDSS
- 💾 Memory Efficient: Store only non-zero elements, enabling solving of systems with millions of unknowns
- 🔄 Differentiable: Full gradient support through
torch.autograd
- 📦 Batch Processing: Solve thousands of systems in parallel
- 🌐 Distributed: Domain decomposition with halo exchange for large-scale problems
- 🔧 Flexible: Multiple backends and solver methods
Key Features
------------
.. raw:: html
- Memory efficient: Only stores non-zero elements — a 1M×1M matrix with 1% density uses ~80MB instead of ~8TB
- Full gradient support via torch.autograd for end-to-end differentiable pipelines
- Multiple backends: SciPy, Eigen, cuSOLVER, cuDSS
- Batch solving: Same-layout and different-layout sparse matrices
- Distributed solving: Domain decomposition with halo exchange
- 169M+ DOF tested: Scales to very large problems with near-linear complexity
Quick Start
-----------
Installation
~~~~~~~~~~~~
.. code-block:: bash
pip install torch-sla
Basic Usage
~~~~~~~~~~~
.. code-block:: python
import torch
from torch_sla import SparseTensor
# Create a sparse matrix from dense (easier to read for small matrices)
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
# Solve Ax = b
b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
x = A.solve(b)
CUDA Acceleration
~~~~~~~~~~~~~~~~~
.. code-block:: python
# Move to GPU for CUDA-accelerated solving
A_cuda = A.cuda()
b_cuda = b.cuda()
x = A_cuda.solve(b_cuda) # Uses cuDSS or cuSOLVER automatically
Use Cases
---------
torch-sla is ideal for:
- **Finite Element Method (FEM)**: Solve large sparse systems from FEM discretization
- **Computational Fluid Dynamics (CFD)**: Efficient sparse solvers for Navier-Stokes
- **Physics-Informed Neural Networks (PINNs)**: Differentiable sparse operations for physics constraints
- **Graph Neural Networks**: Sparse message passing and Laplacian operations
- **Optimization**: Gradient-based optimization involving sparse linear systems
.. toctree::
:maxdepth: 1
:hidden:
introduction
installation
torch_sla
examples
benchmarks
----
Frequently Asked Questions (FAQ)
================================
What is torch-sla?
------------------
torch-sla (Torch Sparse Linear Algebra) is a Python library that provides differentiable sparse linear equation solvers for PyTorch. It solves systems of the form Ax = b where A is a sparse matrix, with full support for automatic differentiation (autograd) and GPU acceleration via CUDA.
How do I solve a sparse linear system in PyTorch?
-------------------------------------------------
Use torch-sla's ``SparseTensor`` class:
.. code-block:: python
from torch_sla import SparseTensor
# Create sparse matrix from COO format (values, row indices, column indices)
A = SparseTensor(values, row, col, shape)
# Solve Ax = b
x = A.solve(b)
This works on both CPU and GPU, and supports gradient computation.
What sparse solvers does torch-sla support?
-------------------------------------------
torch-sla supports multiple backends:
- **CPU**: SciPy (SuperLU, UMFPACK, CG, BiCGStab, GMRES), Eigen (CG, BiCGStab)
- **GPU**: cuSOLVER (QR, Cholesky, LU), cuDSS (LU, Cholesky, LDLT)
The library automatically selects the best solver based on your hardware and matrix properties.
Can I compute gradients through sparse solve?
---------------------------------------------
Yes. torch-sla fully supports PyTorch autograd:
.. code-block:: python
val = torch.tensor([...], requires_grad=True)
x = spsolve(val, row, col, shape, b)
loss = x.sum()
loss.backward() # Computes gradients w.r.t. val and b
How do I solve batched sparse systems?
--------------------------------------
torch-sla supports batched solving for matrices with the same sparsity pattern:
.. code-block:: python
# Batched values: [batch_size, nnz]
A = SparseTensor(val_batch, row, col, (batch_size, M, N))
x = A.solve(b_batch) # Solves all systems in parallel
For matrices with different patterns, use ``SparseTensorList``. See `batched solve examples `_.
How do I use torch-sla on GPU?
------------------------------
Simply move your tensors to CUDA:
.. code-block:: python
A_cuda = A.cuda()
x = A_cuda.solve(b.cuda()) # Uses cuDSS or cuSOLVER
What is the difference between SparseTensor and DSparseTensor?
--------------------------------------------------------------
- ``SparseTensor``: Single sparse matrix (optionally batched), for standard solving
- ``DSparseTensor``: Distributed sparse tensor with domain decomposition, for large-scale parallel computing with halo exchange
Comparison with Alternatives
============================
torch-sla vs scipy.sparse.linalg
--------------------------------
.. list-table::
:widths: 30 35 35
:header-rows: 1
:class: comparison-table
* - Feature
- **torch-sla** ✅
- scipy.sparse.linalg
* - PyTorch Integration
- ✅ **Native tensors**
- ❌ Requires numpy copy
* - GPU Acceleration
- ✅ **CUDA (cuDSS, cuSOLVER)**
- ❌ CPU only
* - Autograd Gradients
- ✅ **Full support (adjoint)**
- ❌ No gradients
* - Batched Solve
- ✅ **Parallel batch solve**
- ❌ Loop required
* - Large Scale (>2M DOF)
- ✅ **169M DOF tested**
- ⚠️ Memory limited
* - Distributed Computing
- ✅ **DSparseTensor**
- ❌ Not supported
* - Eigenvalue/SVD
- ✅ **Differentiable**
- ⚠️ No gradients
* - Nonlinear Solve
- ✅ **Newton/Anderson**
- ❌ Not included
torch-sla vs torch.linalg.solve
-------------------------------
.. list-table::
:widths: 30 35 35
:header-rows: 1
:class: comparison-table
* - Feature
- **torch-sla** ✅
- torch.linalg.solve
* - Matrix Type
- ✅ **Sparse (COO/CSR)**
- ❌ Dense only
* - Memory (1M×1M, 1% density)
- ✅ **~80 MB**
- ❌ ~8 TB (impossible)
* - Max Problem Size
- ✅ **500M+ DOF** (multi-GPU, scalable)
- ❌ ~50K (GPU memory)
* - Specialized Solvers
- ✅ **LU, Cholesky, CG, BiCGStab**
- ⚠️ Dense LU only
* - Batched Operations
- ✅ **Same/different patterns**
- ⚠️ Same shape only
* - GPU Support
- ✅ **cuDSS, cuSOLVER, PyTorch**
- ✅ Yes
* - Autograd
- ✅ **O(1) graph nodes**
- ✅ Yes
torch-sla vs NVIDIA AmgX
------------------------
.. list-table::
:widths: 30 35 35
:header-rows: 1
:class: comparison-table
* - Feature
- **torch-sla** ✅
- NVIDIA AmgX
* - Installation
- ✅ **pip install torch-sla**
- ❌ Complex build process
* - PyTorch Integration
- ✅ **Native**
- ❌ Requires wrapper
* - Autograd Support
- ✅ **Full gradient flow**
- ❌ No gradients
* - Python API
- ✅ **Pythonic**
- ⚠️ C++ focused
* - Multigrid (AMG)
- ❌ Not yet
- ✅ **Core feature**
* - Preconditioners
- ⚠️ Jacobi
- ✅ **ILU, AMG, etc.**
* - Documentation
- ✅ **Comprehensive**
- ⚠️ Limited examples
torch-sla vs PETSc
------------------
.. list-table::
:widths: 30 35 35
:header-rows: 1
:class: comparison-table
* - Feature
- **torch-sla** ✅
- PETSc
* - Installation
- ✅ **pip install**
- ❌ Complex (MPI, compilers)
* - Learning Curve
- ✅ **Simple Python API**
- ❌ Steep (C/Fortran heritage)
* - PyTorch Integration
- ✅ **Native tensors**
- ❌ Requires petsc4py + copies
* - Autograd
- ✅ **Full support**
- ❌ No gradients
* - Solver Variety
- ⚠️ Core methods
- ✅ **Extensive (KSP, SNES)**
* - Distributed
- ✅ **DSparseTensor multi-GPU**
- ✅ **Full MPI support**
* - Production Scale
- ✅ **500M+ DOF** (multi-GPU)
- ✅ **Exascale proven**
Summary: When to Use torch-sla
------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - Use torch-sla When
- Consider Alternatives When
* - ✅ You need **PyTorch integration**
- You're not using PyTorch
* - ✅ You need **gradient flow** through solve
- Gradients not needed
* - ✅ Problem size up to **500M+ DOF** (multi-GPU)
- Exascale problems (use PETSc)
* - ✅ You want **simple pip install**
- You need AMG preconditioners (AmgX)
* - ✅ **Batched** sparse systems
- Complex preconditioning (PETSc)
* - ✅ **GPU acceleration** with minimal setup
- Full MPI distributed (PETSc)
Indices and Search
==================
* :ref:`genindex`
* :ref:`search`
License
-------
torch-sla is released under the MIT License. See `LICENSE `_ for details.
Contact
-------
| **Author**: Walker Chi
| **Email**: ``x@y`` where ``x = walker.chi.000`` and ``y = gmail.com``
Citation
--------
If you use torch-sla in your research, please cite our paper:
.. code-block:: bibtex
@article{chi2026torchsla,
title={torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch},
author={Chi, Mingyuan},
journal={arXiv preprint arXiv:2601.13994},
year={2026},
url={https://arxiv.org/abs/2601.13994}
}
**Paper**: `arXiv:2601.13994 `_ - Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch