torch-sla: PyTorch Sparse Linear Algebra¶

torch-sla (Torch Sparse Linear Algebra) is a memory-efficient, differentiable sparse linear equation solver library for PyTorch with multiple backends. Perfect for scientific computing, FEM, CFD, and machine learning applications requiring sparse matrix operations with automatic differentiation.

Why torch-sla?¶

🚀 High Performance: CUDA-accelerated solvers via cuSOLVER and cuDSS
💾 Memory Efficient: Store only non-zero elements, enabling solving of systems with millions of unknowns
🔄 Differentiable: Full gradient support through torch.autograd
📦 Batch Processing: Solve thousands of systems in parallel
🌐 Distributed: Domain decomposition with halo exchange for large-scale problems
🔧 Flexible: Multiple backends and solver methods

Key Features¶

Memory efficient: Only stores non-zero elements — a 1M×1M matrix with 1% density uses ~80MB instead of ~8TB
Full gradient support via torch.autograd for end-to-end differentiable pipelines
Multiple backends: SciPy, Eigen, cuSOLVER, cuDSS
Batch solving: Same-layout and different-layout sparse matrices
Distributed solving: Domain decomposition with halo exchange
169M+ DOF tested: Scales to very large problems with near-linear complexity

Quick Start¶

Installation¶

pip install torch-sla

Basic Usage¶

import torch
from torch_sla import SparseTensor

# Create a sparse matrix from dense (easier to read for small matrices)
dense = torch.tensor([[4.0, -1.0,  0.0],
                      [-1.0, 4.0, -1.0],
                      [ 0.0, -1.0, 4.0]], dtype=torch.float64)

A = SparseTensor.from_dense(dense)

# Solve Ax = b
b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
x = A.solve(b)

CUDA Acceleration¶

# Move to GPU for CUDA-accelerated solving
A_cuda = A.cuda()
b_cuda = b.cuda()
x = A_cuda.solve(b_cuda)  # Uses cuDSS or cuSOLVER automatically

Use Cases¶

torch-sla is ideal for:

Finite Element Method (FEM): Solve large sparse systems from FEM discretization
Computational Fluid Dynamics (CFD): Efficient sparse solvers for Navier-Stokes
Physics-Informed Neural Networks (PINNs): Differentiable sparse operations for physics constraints
Graph Neural Networks: Sparse message passing and Laplacian operations
Optimization: Gradient-based optimization involving sparse linear systems

Frequently Asked Questions (FAQ)¶

What is torch-sla?¶

torch-sla (Torch Sparse Linear Algebra) is a Python library that provides differentiable sparse linear equation solvers for PyTorch. It solves systems of the form Ax = b where A is a sparse matrix, with full support for automatic differentiation (autograd) and GPU acceleration via CUDA.

How do I solve a sparse linear system in PyTorch?¶

Use torch-sla’s SparseTensor class:

from torch_sla import SparseTensor

# Create sparse matrix from COO format (values, row indices, column indices)
A = SparseTensor(values, row, col, shape)

# Solve Ax = b
x = A.solve(b)

This works on both CPU and GPU, and supports gradient computation.

What sparse solvers does torch-sla support?¶

torch-sla supports multiple backends:

CPU: SciPy (SuperLU, UMFPACK, CG, BiCGStab, GMRES), Eigen (CG, BiCGStab)
GPU: cuSOLVER (QR, Cholesky, LU), cuDSS (LU, Cholesky, LDLT)

The library automatically selects the best solver based on your hardware and matrix properties.

Can I compute gradients through sparse solve?¶

Yes. torch-sla fully supports PyTorch autograd:

val = torch.tensor([...], requires_grad=True)
x = spsolve(val, row, col, shape, b)
loss = x.sum()
loss.backward()  # Computes gradients w.r.t. val and b

How do I solve batched sparse systems?¶

torch-sla supports batched solving for matrices with the same sparsity pattern:

# Batched values: [batch_size, nnz]
A = SparseTensor(val_batch, row, col, (batch_size, M, N))
x = A.solve(b_batch)  # Solves all systems in parallel

For matrices with different patterns, use SparseTensorList. See batched solve examples.

How do I use torch-sla on GPU?¶

Simply move your tensors to CUDA:

A_cuda = A.cuda()
x = A_cuda.solve(b.cuda())  # Uses cuDSS or cuSOLVER

What is the difference between SparseTensor and DSparseTensor?¶

SparseTensor: Single sparse matrix (optionally batched), for standard solving
DSparseTensor: Distributed sparse tensor with domain decomposition, for large-scale parallel computing with halo exchange

Comparison with Alternatives¶

torch-sla vs scipy.sparse.linalg¶

Feature	torch-sla ✅	scipy.sparse.linalg
PyTorch Integration	✅ Native tensors	❌ Requires numpy copy
GPU Acceleration	✅ CUDA (cuDSS, cuSOLVER)	❌ CPU only
Autograd Gradients	✅ Full support (adjoint)	❌ No gradients
Batched Solve	✅ Parallel batch solve	❌ Loop required
Large Scale (>2M DOF)	✅ 169M DOF tested	⚠️ Memory limited
Distributed Computing	✅ DSparseTensor	❌ Not supported
Eigenvalue/SVD	✅ Differentiable	⚠️ No gradients
Nonlinear Solve	✅ Newton/Anderson	❌ Not included

torch-sla vs torch.linalg.solve¶

Feature	torch-sla ✅	torch.linalg.solve
Matrix Type	✅ Sparse (COO/CSR)	❌ Dense only
Memory (1M×1M, 1% density)	✅ ~80 MB	❌ ~8 TB (impossible)
Max Problem Size	✅ 500M+ DOF (multi-GPU, scalable)	❌ ~50K (GPU memory)
Specialized Solvers	✅ LU, Cholesky, CG, BiCGStab	⚠️ Dense LU only
Batched Operations	✅ Same/different patterns	⚠️ Same shape only
GPU Support	✅ cuDSS, cuSOLVER, PyTorch	✅ Yes
Autograd	✅ O(1) graph nodes	✅ Yes

torch-sla vs NVIDIA AmgX¶

Feature	torch-sla ✅	NVIDIA AmgX
Installation	✅ pip install torch-sla	❌ Complex build process
PyTorch Integration	✅ Native	❌ Requires wrapper
Autograd Support	✅ Full gradient flow	❌ No gradients
Python API	✅ Pythonic	⚠️ C++ focused
Multigrid (AMG)	❌ Not yet	✅ Core feature
Preconditioners	⚠️ Jacobi	✅ ILU, AMG, etc.
Documentation	✅ Comprehensive	⚠️ Limited examples

torch-sla vs PETSc¶

Feature	torch-sla ✅	PETSc
Installation	✅ pip install	❌ Complex (MPI, compilers)
Learning Curve	✅ Simple Python API	❌ Steep (C/Fortran heritage)
PyTorch Integration	✅ Native tensors	❌ Requires petsc4py + copies
Autograd	✅ Full support	❌ No gradients
Solver Variety	⚠️ Core methods	✅ Extensive (KSP, SNES)
Distributed	✅ DSparseTensor multi-GPU	✅ Full MPI support
Production Scale	✅ 500M+ DOF (multi-GPU)	✅ Exascale proven

Summary: When to Use torch-sla¶

Use torch-sla When	Consider Alternatives When
✅ You need PyTorch integration	You’re not using PyTorch
✅ You need gradient flow through solve	Gradients not needed
✅ Problem size up to 500M+ DOF (multi-GPU)	Exascale problems (use PETSc)
✅ You want simple pip install	You need AMG preconditioners (AmgX)
✅ Batched sparse systems	Complex preconditioning (PETSc)
✅ GPU acceleration with minimal setup	Full MPI distributed (PETSc)

Indices and Search¶

License¶

torch-sla is released under the MIT License. See LICENSE for details.

Contact¶

Author: Walker Chi

Email: x@y where x = walker.chi.000 and y = gmail.com

Citation¶

If you use torch-sla in your research, please cite our paper:

@article{chi2026torchsla,
  title={torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch},
  author={Chi, Mingyuan},
  journal={arXiv preprint arXiv:2601.13994},
  year={2026},
  url={https://arxiv.org/abs/2601.13994}
}

Paper: arXiv:2601.13994 - Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch