torch-sla: PyTorch Sparse Linear Algebraยถ
torch-sla (Torch Sparse Linear Algebra) is a memory-efficient, differentiable sparse linear equation solver library for PyTorch with multiple backends. Perfect for scientific computing, FEM, CFD, and machine learning applications requiring sparse matrix operations with automatic differentiation.
Why torch-sla?ยถ
- ๐ High Performance: CUDA-accelerated solvers via cuSOLVER and cuDSS
- ๐พ Memory Efficient: Store only non-zero elements, enabling solving of systems with millions of unknowns
- ๐ Differentiable: Full gradient support through
torch.autograd - ๐ฆ Batch Processing: Solve thousands of systems in parallel
- ๐ Distributed: Domain decomposition with halo exchange for large-scale problems
- ๐ง Flexible: Multiple backends and solver methods
Key Featuresยถ
- Memory efficient: Only stores non-zero elements โ a 1Mร1M matrix with 1% density uses ~80MB instead of ~8TB
- Full gradient support via torch.autograd for end-to-end differentiable pipelines
- Multiple backends: SciPy, Eigen, cuSOLVER, cuDSS
- Batch solving: Same-layout and different-layout sparse matrices
- Distributed solving: Domain decomposition with halo exchange
- 169M+ DOF tested: Scales to very large problems with near-linear complexity
Quick Startยถ
Installationยถ
pip install torch-sla
Basic Usageยถ
import torch
from torch_sla import SparseTensor
# Create a sparse matrix from dense (easier to read for small matrices)
dense = torch.tensor([[4.0, -1.0, 0.0],
[-1.0, 4.0, -1.0],
[ 0.0, -1.0, 4.0]], dtype=torch.float64)
A = SparseTensor.from_dense(dense)
# Solve Ax = b
b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
x = A.solve(b)
CUDA Accelerationยถ
# Move to GPU for CUDA-accelerated solving
A_cuda = A.cuda()
b_cuda = b.cuda()
x = A_cuda.solve(b_cuda) # Uses cuDSS or cuSOLVER automatically
Use Casesยถ
torch-sla is ideal for:
Finite Element Method (FEM): Solve large sparse systems from FEM discretization
Computational Fluid Dynamics (CFD): Efficient sparse solvers for Navier-Stokes
Physics-Informed Neural Networks (PINNs): Differentiable sparse operations for physics constraints
Graph Neural Networks: Sparse message passing and Laplacian operations
Optimization: Gradient-based optimization involving sparse linear systems
Frequently Asked Questions (FAQ)ยถ
What is torch-sla?ยถ
torch-sla (Torch Sparse Linear Algebra) is a Python library that provides differentiable sparse linear equation solvers for PyTorch. It solves systems of the form Ax = b where A is a sparse matrix, with full support for automatic differentiation (autograd) and GPU acceleration via CUDA.
How do I solve a sparse linear system in PyTorch?ยถ
Use torch-slaโs SparseTensor class:
from torch_sla import SparseTensor
# Create sparse matrix from COO format (values, row indices, column indices)
A = SparseTensor(values, row, col, shape)
# Solve Ax = b
x = A.solve(b)
This works on both CPU and GPU, and supports gradient computation.
What sparse solvers does torch-sla support?ยถ
torch-sla supports multiple backends:
CPU: SciPy (SuperLU, UMFPACK, CG, BiCGStab, GMRES), Eigen (CG, BiCGStab)
GPU: cuSOLVER (QR, Cholesky, LU), cuDSS (LU, Cholesky, LDLT)
The library automatically selects the best solver based on your hardware and matrix properties.
Can I compute gradients through sparse solve?ยถ
Yes. torch-sla fully supports PyTorch autograd:
val = torch.tensor([...], requires_grad=True)
x = spsolve(val, row, col, shape, b)
loss = x.sum()
loss.backward() # Computes gradients w.r.t. val and b
How do I solve batched sparse systems?ยถ
torch-sla supports batched solving for matrices with the same sparsity pattern:
# Batched values: [batch_size, nnz]
A = SparseTensor(val_batch, row, col, (batch_size, M, N))
x = A.solve(b_batch) # Solves all systems in parallel
For matrices with different patterns, use SparseTensorList. See batched solve examples.
How do I use torch-sla on GPU?ยถ
Simply move your tensors to CUDA:
A_cuda = A.cuda()
x = A_cuda.solve(b.cuda()) # Uses cuDSS or cuSOLVER
What is the difference between SparseTensor and DSparseTensor?ยถ
SparseTensor: Single sparse matrix (optionally batched), for standard solvingDSparseTensor: Distributed sparse tensor with domain decomposition, for large-scale parallel computing with halo exchange
Comparison with Alternativesยถ
torch-sla vs scipy.sparse.linalgยถ
Feature |
torch-sla โ |
scipy.sparse.linalg |
|---|---|---|
PyTorch Integration |
โ Native tensors |
โ Requires numpy copy |
GPU Acceleration |
โ CUDA (cuDSS, cuSOLVER) |
โ CPU only |
Autograd Gradients |
โ Full support (adjoint) |
โ No gradients |
Batched Solve |
โ Parallel batch solve |
โ Loop required |
Large Scale (>2M DOF) |
โ 169M DOF tested |
โ ๏ธ Memory limited |
Distributed Computing |
โ DSparseTensor |
โ Not supported |
Eigenvalue/SVD |
โ Differentiable |
โ ๏ธ No gradients |
Nonlinear Solve |
โ Newton/Anderson |
โ Not included |
torch-sla vs torch.linalg.solveยถ
Feature |
torch-sla โ |
torch.linalg.solve |
|---|---|---|
Matrix Type |
โ Sparse (COO/CSR) |
โ Dense only |
Memory (1Mร1M, 1% density) |
โ ~80 MB |
โ ~8 TB (impossible) |
Max Problem Size |
โ 500M+ DOF (multi-GPU, scalable) |
โ ~50K (GPU memory) |
Specialized Solvers |
โ LU, Cholesky, CG, BiCGStab |
โ ๏ธ Dense LU only |
Batched Operations |
โ Same/different patterns |
โ ๏ธ Same shape only |
GPU Support |
โ cuDSS, cuSOLVER, PyTorch |
โ Yes |
Autograd |
โ O(1) graph nodes |
โ Yes |
torch-sla vs NVIDIA AmgXยถ
Feature |
torch-sla โ |
NVIDIA AmgX |
|---|---|---|
Installation |
โ pip install torch-sla |
โ Complex build process |
PyTorch Integration |
โ Native |
โ Requires wrapper |
Autograd Support |
โ Full gradient flow |
โ No gradients |
Python API |
โ Pythonic |
โ ๏ธ C++ focused |
Multigrid (AMG) |
โ Not yet |
โ Core feature |
Preconditioners |
โ ๏ธ Jacobi |
โ ILU, AMG, etc. |
Documentation |
โ Comprehensive |
โ ๏ธ Limited examples |
torch-sla vs PETScยถ
Feature |
torch-sla โ |
PETSc |
|---|---|---|
Installation |
โ pip install |
โ Complex (MPI, compilers) |
Learning Curve |
โ Simple Python API |
โ Steep (C/Fortran heritage) |
PyTorch Integration |
โ Native tensors |
โ Requires petsc4py + copies |
Autograd |
โ Full support |
โ No gradients |
Solver Variety |
โ ๏ธ Core methods |
โ Extensive (KSP, SNES) |
Distributed |
โ DSparseTensor multi-GPU |
โ Full MPI support |
Production Scale |
โ 500M+ DOF (multi-GPU) |
โ Exascale proven |
Summary: When to Use torch-slaยถ
Use torch-sla When |
Consider Alternatives When |
|---|---|
โ You need PyTorch integration |
Youโre not using PyTorch |
โ You need gradient flow through solve |
Gradients not needed |
โ Problem size up to 500M+ DOF (multi-GPU) |
Exascale problems (use PETSc) |
โ You want simple pip install |
You need AMG preconditioners (AmgX) |
โ Batched sparse systems |
Complex preconditioning (PETSc) |
โ GPU acceleration with minimal setup |
Full MPI distributed (PETSc) |
Indices and Searchยถ
Licenseยถ
torch-sla is released under the MIT License. See LICENSE for details.
Contactยถ
x@y where x = walker.chi.000 and y = gmail.comCitationยถ
If you use torch-sla in your research, please cite our paper:
@article{chi2026torchsla,
title={torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch},
author={Chi, Mingyuan},
journal={arXiv preprint arXiv:2601.13994},
year={2026},
url={https://arxiv.org/abs/2601.13994}
}
Paper: arXiv:2601.13994 - Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch