API 参考 ¶

示例

>>> A_dense = torch.randn(3, 3)
>>> A_dense[A_dense.abs() < 0.5] = 0
>>> A = SparseTensor.from_dense(A_dense)

classmethod from_torch_sparse(A: Tensor) → SparseTensor[源代码]¶

Create SparseTensor from PyTorch sparse tensor.

参数:: A (torch.Tensor) -- PyTorch sparse COO or CSR tensor (2D only).
返回:: SparseTensor representation.
返回类型:: SparseTensor

示例

>>> A_coo = torch.randn(3, 3).to_sparse_coo()
>>> A = SparseTensor.from_torch_sparse(A_coo)

property shape: Tuple[int, ...]¶: Full tensor shape [...batch, M, N, ...block].

property sparse_shape: Tuple[int, int]¶: The (M, N) sparse matrix dimensions.

property batch_shape: Tuple[int, ...]¶: The batch dimensions before the sparse dimensions.

property block_shape: Tuple[int, ...]¶: The block dimensions after the sparse dimensions.

property sparse_dim: Tuple[int, int]¶: The dimensions that are sparse (M, N).

property ndim: int¶: Number of dimensions.

property nnz: int¶: Number of non-zero elements (per batch/block).

property dtype: dtype¶: Data type of the values.

property device: device¶: Device of the tensor.

property is_cuda: bool¶: Whether the tensor is on CUDA.

property is_batched: bool¶: Whether the tensor has batch dimensions.

property is_block: bool¶: Whether the tensor has block dimensions.

property batch_size: int¶: Total number of batch elements (product of batch_shape).

property is_square: bool¶: Whether the sparse dimensions are square (M == N).

to(device: str | device | None = None, dtype: dtype | None = None) → SparseTensor[源代码]¶

Move tensor to device and/or convert dtype.

参数:

device (str or torch.device, optional) -- Target device (e.g., 'cuda', 'cpu', 'cuda:0').
dtype (torch.dtype, optional) -- Target data type (e.g., torch.float32, torch.float64).

返回:

New SparseTensor on the target device/dtype.

返回类型:

示例

>>> A = SparseTensor(val, row, col, shape)
>>> A_cuda = A.to('cuda')
>>> A_float32 = A.to(dtype=torch.float32)
>>> A_cuda_float32 = A.to('cuda', torch.float32)

cuda(device: int | None = None) → SparseTensor[源代码]¶

Move tensor to CUDA device.

参数:: device (int, optional) -- CUDA device index. Default: current device.
返回:: Tensor on CUDA.
返回类型:: SparseTensor

cpu() → SparseTensor[源代码]¶

Move tensor to CPU.

返回:: Tensor on CPU.
返回类型:: SparseTensor

float() → SparseTensor[源代码]¶: Convert to float32.

double() → SparseTensor[源代码]¶: Convert to float64.

half() → SparseTensor[源代码]¶: Convert to float16.

to_torch_sparse(batch_idx: Tuple[int, ...] | None = None) → Tensor[源代码]¶

Convert to PyTorch sparse COO tensor.

参数:: batch_idx (Tuple[int, ...], optional) -- For batched tensors, which batch element to convert. Default: (0, 0, ...) for first batch element.
返回:: PyTorch sparse COO tensor.
返回类型:: torch.Tensor

to_dense(batch_idx: Tuple[int, ...] | None = None) → Tensor[源代码]¶

Convert to dense tensor.

参数:: batch_idx (Tuple[int, ...], optional) -- For batched tensors, which batch element to convert.
返回:: Dense tensor.
返回类型:: torch.Tensor

to_csr(batch_idx: Tuple[int, ...] | None = None) → Tensor[源代码]¶

Convert to CSR format.

参数:: batch_idx (Tuple[int, ...], optional) -- For batched tensors, which batch element to convert.
返回:: PyTorch sparse CSR tensor.
返回类型:: torch.Tensor

partition(num_partitions: int, coords: Tensor | None = None, partition_method: str = 'auto', verbose: bool = False) → DSparseTensor[源代码]¶

Partition into a distributed sparse tensor.

Creates a DSparseTensor with automatic domain decomposition. This is useful for distributed computing and parallel solvers.

参数:

num_partitions (int) -- Number of partitions to create
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning [num_nodes, dim]. Required for 'rcb' and 'slicing' methods.
partition_method (str) -- Partitioning method: - 'auto': Auto-select (uses 'rcb' if coords provided, else 'metis') - 'metis': Graph-based partitioning (requires pymetis) - 'rcb': Recursive Coordinate Bisection (requires coords) - 'slicing': Simple coordinate slicing (requires coords) - 'simple': Simple 1D partitioning by node index
verbose (bool) -- Whether to print partition info

返回:

Distributed sparse tensor with the specified partitions

返回类型:

示例

>>> A = SparseTensor(val, row, col, shape)
>>> D = A.partition(num_partitions=4)
>>> for i in range(4):
...     partition = D[i]
...     y = partition.matvec(x_local)

备注

Use D.to_sparse_tensor() to gather back to a SparseTensor
For distributed training, use partition_for_rank() instead

partition_for_rank(rank: int, world_size: int, coords: Tensor | None = None, partition_method: str = 'simple', verbose: bool = False) → DSparseMatrix[源代码]¶

Get partition for a specific rank in distributed environment.

This is the recommended API for multi-process distributed computing. Each rank calls this method with its own rank ID to get its local partition. The partitioning is deterministic and consistent across all ranks.

参数:

rank (int) -- This process's rank (0 to world_size-1)
world_size (int) -- Total number of processes
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning
partition_method (str) -- Partitioning method ('simple', 'metis', 'rcb', 'slicing')
verbose (bool) -- Print partition info

返回:

Local partition for this rank

返回类型:

示例

>>> # In multi-process code:
>>> A = SparseTensor(val, row, col, shape)
>>> partition = A.partition_for_rank(rank, world_size)
>>> y_local = partition.matvec(x_local)

备注

This uses DSparseTensor.from_global_distributed() internally, which broadcasts partition IDs from rank 0 for consistency.
Requires torch.distributed to be initialized.

T() → SparseTensor[源代码]¶

Transpose the sparse dimensions.

返回:: Transposed tensor with row/col indices swapped.
返回类型:: SparseTensor

flatten_blocks() → SparseTensor[源代码]¶

Flatten block dimensions into the sparse (M, N) dimensions.

For a block-sparse tensor with shape [...batch, M, N, *block_shape], this creates a new tensor with shape [...batch, M*block_M, N*block_N] where each block entry becomes multiple scalar entries.

返回:: Flattened tensor without block dimensions.
返回类型:: SparseTensor

示例

>>> # Block sparse: shape (10, 10, 2, 2), block_shape=(2, 2)
>>> A = SparseTensor(val, row, col, (10, 10, 2, 2))
>>> A_flat = A.flatten_blocks()
>>> print(A_flat.shape)  # (20, 20)
>>> print(A_flat.nnz)    # nnz * 4 (each block has 4 elements)

备注

Only works for 2D block shapes (block_M, block_N).
Use unflatten_blocks(block_shape) to reverse this operation.
The flattened tensor's sparsity pattern may have duplicates that need to be coalesced.

unflatten_blocks(block_shape: Tuple[int, int]) → SparseTensor[源代码]¶

Restore block structure from a flattened tensor.

This is the inverse of flatten_blocks(). It groups scalar entries back into block entries.

参数:: block_shape (Tuple[int, int]) -- The (block_M, block_N) dimensions to create. M and N must be divisible by block_M and block_N respectively.
返回:: Block-sparse tensor with the specified block shape.
返回类型:: SparseTensor

示例

>>> A_flat = SparseTensor(val, row, col, (20, 20))
>>> A_block = A_flat.unflatten_blocks((2, 2))
>>> print(A_block.shape)  # (10, 10, 2, 2)
>>> print(A_block.block_shape)  # (2, 2)

备注

Requires that the sparsity pattern is block-aligned.
All block entries must be present (dense within each block).
For sparse blocks, use to_block_sparse() instead.

is_symmetric(atol: float = 1e-08, rtol: float = 1e-05, force_recompute: bool = False) → Tensor[源代码]¶

Check if the matrix is symmetric (A == A^T).

For batched tensors, checks each matrix independently and returns a boolean tensor with shape matching the batch dimensions.

参数:

atol (float, optional) -- Absolute tolerance for comparison. Default: 1e-8.
rtol (float, optional) -- Relative tolerance for comparison. Default: 1e-5.
force_recompute (bool, optional) -- If True, recompute even if cached. Default: False.

返回:

Boolean tensor with shape: - [] (scalar) for non-batched tensors - [*batch_shape] for batched tensors

返回类型:

示例

>>> A = SparseTensor(val, row, col, (3, 3))
>>> A.is_symmetric()  # tensor(True) or tensor(False)

>>> A_batch = SparseTensor(val_batch, row, col, (4, 3, 3))
>>> A_batch.is_symmetric()  # tensor([True, True, True, True])

is_positive_definite(method: Literal['gershgorin', 'cholesky', 'eigenvalue'] = 'gershgorin', force_recompute: bool = False) → Tensor[源代码]¶

Check if the matrix is positive definite.

For batched tensors, checks each matrix independently and returns a boolean tensor with shape matching the batch dimensions.

参数:

method ({"gershgorin", "cholesky", "eigenvalue"}, optional) -- Method for checking: - "gershgorin": Fast check using Gershgorin circles (sufficient but not necessary) - "cholesky": Try Cholesky decomposition (necessary and sufficient, slower) - "eigenvalue": Check smallest eigenvalues (necessary and sufficient, slowest) Default: "gershgorin".
force_recompute (bool, optional) -- If True, recompute even if cached. Default: False.

返回:

Boolean tensor with shape: - [] (scalar) for non-batched tensors - [*batch_shape] for batched tensors

返回类型:

示例

>>> A = SparseTensor(val, row, col, (3, 3))
>>> A.is_positive_definite()  # tensor(True) or tensor(False)
>>> A.is_positive_definite(method="cholesky")  # More accurate check

>>> A_batch = SparseTensor(val_batch, row, col, (4, 3, 3))
>>> A_batch.is_positive_definite()  # tensor([True, True, True, True])

connected_components() → Tuple[Tensor, int][源代码]¶

Find connected components of the graph represented by this sparse matrix.

Uses union-find algorithm for efficiency. Treats the matrix as an undirected graph adjacency matrix.

返回:

labels (torch.Tensor) -- Component label for each node, shape [N]. Labels are in range [0, num_components).
num_components (int) -- Number of connected components.

备注

Only works for non-batched 2D matrices
Matrix is treated as undirected (edges in either direction count)
Self-loops are ignored for connectivity

示例

>>> # Block diagonal matrix with 3 components
>>> A = SparseTensor(val, row, col, (100, 100))
>>> labels, num_comp = A.connected_components()
>>> print(f"Found {num_comp} components")

has_isolated_components() → bool[源代码]¶

Check if the matrix has multiple connected components.

返回:: True if matrix has more than one connected component.
返回类型:: bool

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> if A.has_isolated_components():
...     components = A.to_connected_components()

to_connected_components() → SparseTensorList[源代码]¶

Split the matrix into a list of connected component subgraphs.

Each component becomes a separate SparseTensor with reindexed nodes.

返回:: List of SparseTensors, one per connected component.
返回类型:: SparseTensorList

备注

Each component's nodes are reindexed from 0
Original node indices can be recovered from the mapping

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> components = A.to_connected_components()
>>> print(f"Split into {len(components)} components")
>>> for i, comp in enumerate(components):
...     print(f"  Component {i}: {comp.shape}")

solve(b: Tensor, backend: Literal['scipy', 'eigen', 'pytorch', 'cusolver', 'cudss', 'auto'] = 'auto', method: Literal['auto', 'superlu', 'umfpack', 'lu', 'qr', 'cholesky', 'ldlt', 'cg', 'bicgstab', 'gmres', 'lgmres', 'minres', 'qmr'] = 'auto', atol: float = 1e-10, maxiter: int = 10000, tol: float = 1e-12) → Tensor[源代码]¶

Solve the sparse linear system Ax = b.

Automatically handles batched tensors: if A is [...batch, M, N] and b is [...batch, M], returns x with shape [...batch, N].

参数:

b (torch.Tensor) -- Right-hand side vector(s). Shape: - Non-batched: [M] or [M, K] for multiple RHS - Batched: [...batch, M] or [...batch, M, K]
backend ({"auto", "scipy", "eigen", "cusolver", "cudss"}, optional) -- Solver backend. Default: "auto" (selects based on device). - "scipy": Uses SciPy's sparse solvers (CPU only) - "eigen": Uses Eigen C++ library (CPU only) - "cusolver": Uses NVIDIA cuSOLVER (CUDA only) - "cudss": Uses NVIDIA cuDSS (CUDA only)
method (str, optional) -- Solver method. Default: "auto" (selects based on matrix properties). - Direct methods: "superlu", "umfpack", "lu", "qr", "cholesky", "ldlt" - Iterative methods: "cg", "bicgstab", "gmres", "minres"
atol (float, optional) -- Absolute tolerance for iterative solvers. Default: 1e-10.
maxiter (int, optional) -- Maximum iterations for iterative solvers. Default: 10000.
tol (float, optional) -- Relative tolerance for direct solvers. Default: 1e-12.

返回:

Solution x with same batch shape as b.

返回类型:

抛出:

ValueError -- If matrix is not square.
NotImplementedError -- If block sparse tensors are used (not yet supported).

示例

>>> # Simple solve
>>> A = SparseTensor(val, row, col, (3, 3))
>>> b = torch.randn(3)
>>> x = A.solve(b)

>>> # Batched solve
>>> A_batch = SparseTensor(val_batch, row, col, (4, 3, 3))
>>> b_batch = torch.randn(4, 3)
>>> x_batch = A_batch.solve(b_batch)

>>> # Specify backend
>>> x = A.solve(b, backend='scipy', method='cg')

solve_batch(values: Tensor, b: Tensor, backend: Literal['scipy', 'eigen', 'pytorch', 'cusolver', 'cudss', 'auto'] = 'auto', method: Literal['auto', 'superlu', 'umfpack', 'lu', 'qr', 'cholesky', 'ldlt', 'cg', 'bicgstab', 'gmres', 'lgmres', 'minres', 'qmr'] = 'auto', atol: float = 1e-10, maxiter: int = 10000, tol: float = 1e-12) → Tensor[源代码]¶

Solve with different values but same sparsity structure.

This is efficient when you have the same structure but different values (e.g., time-stepping, optimization, parameter sweeps).

参数:

values (torch.Tensor) -- Matrix values. Shape [...batch, nnz] where ... are batch dimensions. All matrices share the same row_indices and col_indices.
b (torch.Tensor) -- Right-hand side. Shape [...batch, M].
backend ({"auto", "scipy", "eigen", "cusolver", "cudss"}, optional) -- Solver backend. See solve() for details. Default: "auto".
method (str, optional) -- Solver method. See solve() for details. Default: "auto".
atol (float, optional) -- Absolute tolerance for iterative solvers. Default: 1e-10.
maxiter (int, optional) -- Maximum iterations for iterative solvers. Default: 10000.
tol (float, optional) -- Relative tolerance. Default: 1e-12.

返回:

Solution x with shape [...batch, N].

返回类型:

示例

>>> # Template matrix
>>> A = SparseTensor(val, row, col, (10, 10))

>>> # Batch of different values
>>> val_batch = torch.stack([val * (1 + 0.1*i) for i in range(4)])  # [4, nnz]
>>> b_batch = torch.randn(4, 10)

>>> # Solve all at once
>>> x_batch = A.solve_batch(val_batch, b_batch)  # [4, 10]

nonlinear_solve(residual_fn, u0: Tensor, *params, method: Literal['newton', 'picard', 'anderson'] = 'newton', tol: float = 1e-06, atol: float = 1e-10, max_iter: int = 50, line_search: bool = True, verbose: bool = False, linear_solver: Literal['scipy', 'eigen', 'pytorch', 'cusolver', 'cudss', 'auto'] = 'pytorch', linear_method: Literal['auto', 'superlu', 'umfpack', 'lu', 'qr', 'cholesky', 'ldlt', 'cg', 'bicgstab', 'gmres', 'lgmres', 'minres', 'qmr'] = 'cg') → Tensor[源代码]¶

Solve nonlinear equation F(u, A, θ) = 0 with adjoint-based gradients.

The SparseTensor A is automatically passed as the first parameter to the residual function, enabling gradients to flow through A's values.

参数:

residual_fn (Callable) -- Function F(u, A, *params) -> residual tensor. - u: Current solution estimate - A: This SparseTensor (passed automatically) - *params: Additional parameters with requires_grad=True
u0 (torch.Tensor) -- Initial guess for solution.
*params (torch.Tensor) -- Additional parameters (e.g., boundary conditions, coefficients). Tensors with requires_grad=True will receive gradients.
method ({'newton', 'picard', 'anderson'}, optional) -- Nonlinear solver method: - 'newton': Newton-Raphson with line search (default, fast) - 'picard': Fixed-point iteration (simple, slow) - 'anderson': Anderson acceleration (memory efficient)
tol (float, optional) -- Relative convergence tolerance. Default: 1e-6.
atol (float, optional) -- Absolute convergence tolerance. Default: 1e-10.
max_iter (int, optional) -- Maximum nonlinear iterations. Default: 50.
line_search (bool, optional) -- Use Armijo line search for Newton. Default: True.
verbose (bool, optional) -- Print convergence information. Default: False.
linear_solver (str, optional) -- Backend for linear solves. Default: 'pytorch'.
linear_method (str, optional) -- Method for linear solves. Default: 'cg'.

返回:

Solution u* satisfying F(u*, A, θ) ≈ 0.

返回类型:

示例

>>> # Nonlinear PDE: A @ u + u² = f
>>> def residual(u, A, f):
...     return A @ u + u**2 - f
...
>>> A = SparseTensor(val, row, col, (n, n))
>>> f = torch.randn(n, requires_grad=True)
>>> u0 = torch.zeros(n)
>>>
>>> u = A.nonlinear_solve(residual, u0, f, method='newton')
>>>
>>> # Gradients flow via adjoint method
>>> loss = u.sum()
>>> loss.backward()
>>> print(f.grad)  # ∂u/∂f
>>> print(A.values.grad)  # ∂u/∂A (if A.values.requires_grad)

>>> # Nonlinear elasticity: K(u) @ u = F
>>> def residual_elasticity(u, K, F, material):
...     # K depends on displacement through material nonlinearity
...     return K @ u - F + material * u**3
...
>>> u = K.nonlinear_solve(residual_elasticity, u0, F, material)

norm(ord: Literal['fro', 1, 2] = 'fro') → Tensor[源代码]¶

Compute matrix norm.

For batched tensors, returns norm for each batch element.

参数:: ord ({'fro', 1, 2}, optional) -- Norm type: - 'fro': Frobenius norm (default) - 1: Maximum absolute column sum - 2: Spectral norm (largest singular value)
返回:: Norm value(s). Shape [] for non-batched, [*batch_shape] for batched.
返回类型:: torch.Tensor

示例

>>> A = SparseTensor(val, row, col, (3, 3))
>>> A.norm('fro')  # tensor(5.0)

>>> A_batch = SparseTensor(val_batch, row, col, (4, 3, 3))
>>> A_batch.norm('fro')  # tensor([5.0, 5.0, 5.0, 5.0])

spy(batch_idx: Tuple[int, ...] | None = None, ax=None, title: str | None = None, cmap: str = 'viridis', show_grid: bool = True, grid_color: str = '#cccccc', grid_linewidth: float = 0.5, show_colorbar: bool = True, figsize: Tuple[float, float] = (8, 8), save_path: str | None = None, dpi: int = 150)[源代码]¶

Visualize the sparsity pattern with values shown as color intensity.

Creates a spy plot where each matrix element is rendered as a pixel. Non-zero elements are colored with intensity proportional to the absolute value, while zero elements are shown as white. This provides a pixel-perfect visualization without overlapping markers.

参数:

batch_idx (Tuple[int, ...], optional) -- For batched tensors, which batch element to visualize. Required if the tensor is batched.
ax (matplotlib.axes.Axes, optional) -- Axes to plot on. If None, creates a new figure.
title (str, optional) -- Plot title. Defaults to showing matrix info.
cmap (str, optional) -- Colormap for values. Default: 'viridis'. Other options: 'plasma', 'hot', 'coolwarm', 'Greys', etc.
show_grid (bool, optional) -- Whether to show grid lines (only for matrices <= 30x30). Default: True.
grid_color (str, optional) -- Color of grid lines. Default: '#cccccc' (light gray).
grid_linewidth (float, optional) -- Width of grid lines. Default: 0.5.
show_colorbar (bool, optional) -- Whether to show colorbar for values. Default: True.
figsize (Tuple[float, float], optional) -- Figure size in inches. Default: (8, 8).
save_path (str, optional) -- If provided, save figure to this path.
dpi (int, optional) -- DPI for saved figure. Default: 150.

返回:

ax -- The axes object with the plot.

返回类型:

matplotlib.axes.Axes

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> A.spy()  # Basic spy plot
>>> A.spy(cmap='hot', show_grid=False)  # Custom colormap, no grid
>>> A.spy(save_path='matrix.png')  # Save to file

>>> # For batched tensor
>>> A_batch = SparseTensor(val_batch, row, col, (4, 100, 100))
>>> A_batch.spy(batch_idx=(0,))  # Visualize first batch element

eigs(k: int = 6, which: str = 'LM', sigma: float | None = None, return_eigenvectors: bool = True) → Tuple[Tensor, Tensor | None][源代码]¶

Compute k eigenvalues and eigenvectors.

For batched tensors, computes for each batch element. For CUDA tensors, uses LOBPCG algorithm.

参数:

k (int, optional) -- Number of eigenvalues to compute. Default: 6.
which ({"LM", "SM", "LR", "SR", "LA", "SA"}, optional) -- Which eigenvalues to find: - "LM": Largest magnitude (default) - "SM": Smallest magnitude - "LR"/"SR": Largest/smallest real part - "LA"/"SA": Largest/smallest algebraic (for symmetric)
sigma (float, optional) -- Find eigenvalues near sigma (shift-invert mode).
return_eigenvectors (bool, optional) -- Whether to return eigenvectors. Default: True.

返回:

eigenvalues (torch.Tensor) -- Shape [k] for non-batched, [*batch_shape, k] for batched.
eigenvectors (torch.Tensor or None) -- Shape [M, k] for non-batched, [*batch_shape, M, k] for batched. None if return_eigenvectors is False.

备注

Gradient Support:

Both CPU and CUDA: Fully differentiable via adjoint method
Uses O(1) graph nodes regardless of iteration count
For symmetric matrices, prefer eigsh() for efficiency

Warning: For non-symmetric matrices with complex eigenvalues, gradient computation is only supported for the real part.

示例

>>> A = SparseTensor(val.requires_grad_(True), row, col, (n, n))
>>> eigenvalues, eigenvectors = A.eigs(k=3)
>>> loss = eigenvalues.real.sum()  # For complex eigenvalues
>>> loss.backward()

eigsh(k: int = 6, which: str = 'LM', sigma: float | None = None, return_eigenvectors: bool = True) → Tuple[Tensor, Tensor | None][源代码]¶

Compute k eigenvalues for symmetric matrices.

More efficient than eigs() for symmetric matrices.

参数:

k (int, optional) -- Number of eigenvalues to compute. Default: 6.
which ({"LM", "SM", "LA", "SA"}, optional) -- Which eigenvalues to find: - "LM": Largest magnitude (default) - "SM": Smallest magnitude - "LA"/"SA": Largest/smallest algebraic
sigma (float, optional) -- Find eigenvalues near sigma.
return_eigenvectors (bool, optional) -- Whether to return eigenvectors. Default: True.

返回:

eigenvalues (torch.Tensor) -- Shape [k] for non-batched, [*batch_shape, k] for batched.
eigenvectors (torch.Tensor or None) -- Shape [M, k] for non-batched, [*batch_shape, M, k] for batched.

备注

Gradient Support:

Both CPU and CUDA: Fully differentiable via adjoint method
Uses O(1) graph nodes regardless of iteration count
Gradient computed as: ∂L/∂A = Σ_i (∂L/∂λ_i) * v_i @ v_i.T

示例

>>> A = SparseTensor(val.requires_grad_(True), row, col, (n, n))
>>> eigenvalues, eigenvectors = A.eigsh(k=3)
>>> loss = eigenvalues.sum()
>>> loss.backward()  # Computes ∂loss/∂val

svd(k: int = 6) → Tuple[Tensor, Tensor, Tensor][源代码]¶

Compute truncated SVD.

参数:

k (int, optional) -- Number of singular values to compute. Default: 6.

返回:

U (torch.Tensor) -- Left singular vectors. Shape [M, k] or [*batch_shape, M, k].
S (torch.Tensor) -- Singular values. Shape [k] or [*batch_shape, k].
Vt (torch.Tensor) -- Right singular vectors. Shape [k, N] or [*batch_shape, k, N].

备注

Gradient Support:

CUDA: Fully differentiable (uses power iteration with PyTorch operations)
CPU: NOT differentiable (uses SciPy which breaks gradient chain)

For differentiable SVD on CPU, use A.to_dense() and torch.linalg.svd().

condition_number(ord: int = 2) → Tensor[源代码]¶

Estimate condition number.

参数:: ord (int, optional) -- Norm order for condition number. Default: 2 (spectral).
返回:: Condition number. Shape [] or [*batch_shape].
返回类型:: torch.Tensor

det() → Tensor[源代码]¶

Compute determinant of the sparse matrix with gradient support.

Uses LU decomposition (CPU) or dense conversion (CUDA) to compute the determinant efficiently. Supports automatic differentiation via the adjoint method.

返回:: Determinant value. Shape [] for single matrix or [*batch_shape] for batched.
返回类型:: torch.Tensor
抛出:: ValueError -- If matrix is not square

备注

Only square matrices have determinants
For large matrices, determinant values can overflow/underflow
Consider using log-determinant for numerical stability in such cases
Supports both CPU (via SciPy) and CUDA (via torch.linalg.det)
For batched tensors, computes determinant independently for each batch
Fully differentiable: gradients computed via adjoint method
Gradient formula: ∂det(A)/∂A = det(A) * (A^{-1})^T

Performance Warning¶

CUDA performance is significantly slower than CPU for sparse matrices!

CPU: Uses sparse LU decomposition (O(nnz^1.5)), ~0.3-0.8ms for n=10-1000
CUDA: Converts to dense (O(n²) memory + O(n³) compute), ~0.2-2.5ms

The CUDA version requires converting the sparse matrix to dense format because cuSOLVER/cuDSS don't expose determinant computation for sparse matrices. This makes it inefficient for large sparse matrices.

Recommendation: For sparse matrices, use .cpu().det().cuda() instead:

>>> # Slow: CUDA with dense conversion
>>> det_slow = A_cuda.det()  # ~2.5ms for n=1000
>>>
>>> # Fast: CPU with sparse LU
>>> det_fast = A_cuda.cpu().det()  # ~0.8ms for n=1000
>>> det_fast = det_fast.cuda()  # Move result back if needed

示例

>>> # Simple 2x2 matrix
>>> val = torch.tensor([1.0, 2.0, 3.0, 4.0], requires_grad=True)
>>> row = torch.tensor([0, 0, 1, 1])
>>> col = torch.tensor([0, 1, 0, 1])
>>> A = SparseTensor(val, row, col, (2, 2))
>>> det = A.det()
>>> print(det)  # Should be -2.0
>>> det.backward()
>>> print(val.grad)  # Gradient w.r.t. matrix values
>>>
>>> # CUDA support
>>> A_cuda = A.cuda()
>>> det_cuda = A_cuda.det()
>>>
>>> # Batched matrices
>>> val_batch = val.unsqueeze(0).expand(3, -1).clone()
>>> A_batch = SparseTensor(val_batch, row, col, (3, 2, 2))
>>> det_batch = A_batch.det()
>>> print(det_batch.shape)  # torch.Size([3])

lu() → LUFactorization[源代码]¶

Compute LU decomposition for repeated solves.

返回:: Factorization object with solve() method.
返回类型:: LUFactorization

示例

>>> A = SparseTensor(val, row, col, (10, 10))
>>> lu = A.lu()
>>> x1 = lu.solve(b1)
>>> x2 = lu.solve(b2)  # Reuses factorization

sum(axis: int | Tuple[int, ...] | None = None, keepdim: bool = False) → Tensor | SparseTensor[源代码]¶

Sum of sparse tensor elements over specified axis.

参数:

axis (int, tuple of ints, or None) --
Axis or axes along which to sum. Axes correspond to: - Batch dimensions: [...batch] at the beginning - Sparse dimensions: (M, N) at sparse_dim positions - Block dimensions: [...block] at the end

If None, sum over all elements (returns scalar tensor).
keepdim (bool) -- Whether to keep the reduced dimensions.

返回:

If reducing over sparse dimensions: returns dense tensor
If reducing over batch/block dimensions only: returns SparseTensor
If axis=None: returns scalar tensor

返回类型:

torch.Tensor or SparseTensor

示例

>>> # Shape: [batch=2, M=10, N=10, block=3]
>>> A = SparseTensor(val, row, col, (2, 10, 10, 3))
>>>
>>> A.sum()           # Scalar: sum all elements
>>> A.sum(axis=0)     # Sum over batch -> [10, 10, 3]
>>> A.sum(axis=1)     # Sum over M (rows) -> [2, 10, 3] (dense)
>>> A.sum(axis=2)     # Sum over N (cols) -> [2, 10, 3] (dense)
>>> A.sum(axis=3)     # Sum over block -> SparseTensor [2, 10, 10]
>>> A.sum(axis=(1,2)) # Sum over M and N -> [2, 3] (dense)

mean(axis: int | Tuple[int, ...] | None = None, keepdim: bool = False) → Tensor | SparseTensor[源代码]¶

Mean of sparse tensor elements over specified axis.

Note: For sparse dimensions, this computes mean of non-zero values only, NOT the mean over all M*N elements. For full mean, use to_dense().mean().

参数:

axis (int, tuple of ints, or None) -- Axis or axes along which to compute mean.
keepdim (bool) -- Whether to keep the reduced dimensions.

返回:

Mean values.

返回类型:

torch.Tensor or SparseTensor

示例

>>> A = SparseTensor(val, row, col, (10, 10))
>>> A.mean()           # Mean of all non-zero values
>>> A.mean(axis=0)     # Mean over batch dimension

prod(axis: int | Tuple[int, ...] | None = None, keepdim: bool = False) → Tensor | SparseTensor[源代码]¶

Product of sparse tensor elements over specified axis.

Warning: For sparse matrices, zero elements are not included in the product. This means prod() computes the product of non-zero values only.

参数:

axis (int, tuple of ints, or None) -- Axis or axes along which to compute product.
keepdim (bool) -- Whether to keep the reduced dimensions.

返回:

Product values.

返回类型:

torch.Tensor or SparseTensor

示例

>>> A = SparseTensor(val, row, col, (10, 10))
>>> A.prod()           # Product of all non-zero values
>>> A.prod(axis=0)     # Product over batch dimension

max(axis: int | Tuple[int, ...] | None = None, keepdim: bool = False) → Tensor | SparseTensor[源代码]¶: Max of non-zero values over specified axis.

min(axis: int | Tuple[int, ...] | None = None, keepdim: bool = False) → Tensor | SparseTensor[源代码]¶: Min of non-zero values over specified axis.

abs() → SparseTensor[源代码]¶: Element-wise absolute value.

sqrt() → SparseTensor[源代码]¶: Element-wise square root.

square() → SparseTensor[源代码]¶: Element-wise square.

exp() → SparseTensor[源代码]¶: Element-wise exponential.

log() → SparseTensor[源代码]¶: Element-wise natural logarithm.

log10() → SparseTensor[源代码]¶: Element-wise base-10 logarithm.

log2() → SparseTensor[源代码]¶: Element-wise base-2 logarithm.

sin() → SparseTensor[源代码]¶: Element-wise sine.

cos() → SparseTensor[源代码]¶: Element-wise cosine.

tan() → SparseTensor[源代码]¶: Element-wise tangent.

sinh() → SparseTensor[源代码]¶: Element-wise hyperbolic sine.

cosh() → SparseTensor[源代码]¶: Element-wise hyperbolic cosine.

tanh() → SparseTensor[源代码]¶: Element-wise hyperbolic tangent.

sigmoid() → SparseTensor[源代码]¶: Element-wise sigmoid.

relu() → SparseTensor[源代码]¶: Element-wise ReLU.

clamp(min: float | None = None, max: float | None = None) → SparseTensor[源代码]¶: Element-wise clamp.

sign() → SparseTensor[源代码]¶: Element-wise sign.

floor() → SparseTensor[源代码]¶: Element-wise floor.

ceil() → SparseTensor[源代码]¶: Element-wise ceil.

round() → SparseTensor[源代码]¶: Element-wise round.

reciprocal() → SparseTensor[源代码]¶: Element-wise reciprocal (1/x).

pow(exponent: float | int | Tensor) → SparseTensor[源代码]¶: Element-wise power.

logical_not() → SparseTensor[源代码]¶: Element-wise logical NOT.

logical_and(other: SparseTensor) → SparseTensor[源代码]¶: Element-wise logical AND.

logical_or(other: SparseTensor) → SparseTensor[源代码]¶: Element-wise logical OR.

logical_xor(other: SparseTensor) → SparseTensor[源代码]¶: Element-wise logical XOR.

isnan() → SparseTensor[源代码]¶: Element-wise isnan check.

isinf() → SparseTensor[源代码]¶: Element-wise isinf check.

isfinite() → SparseTensor[源代码]¶: Element-wise isfinite check.

detach() → SparseTensor[源代码]¶: Detach from computation graph. Preserves subclass type.

requires_grad_(requires_grad: bool = True) → SparseTensor[源代码]¶: Enable/disable gradient tracking.

property requires_grad: bool¶: Whether gradient tracking is enabled.

property grad: Tensor | None¶: Gradient of values if available.

clone() → SparseTensor[源代码]¶: Create a copy of this SparseTensor. Preserves subclass type.

contiguous() → SparseTensor[源代码]¶: Make values contiguous in memory. Preserves subclass type.

save(path: str | PathLike, metadata: Dict[str, str] | None = None) → None[源代码]¶

Save SparseTensor to safetensors format.

参数:

path (str or PathLike) -- Output file path (should end with .safetensors).
metadata (dict, optional) -- Additional metadata to store.

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> A.save("matrix.safetensors")

classmethod load(path: str | PathLike, device: str | device = 'cpu') → SparseTensor[源代码]¶

Load SparseTensor from safetensors format.

参数:

path (str or PathLike) -- Input file path.
device (str or torch.device) -- Device to load tensors to.

返回:

The loaded sparse tensor.

返回类型:

示例

>>> A = SparseTensor.load("matrix.safetensors", device="cuda")

save_distributed(directory: str | PathLike, num_partitions: int, partition_method: str = 'simple', coords: Tensor | None = None, verbose: bool = False) → None[源代码]¶

Save as partitioned files for distributed loading.

Creates a directory with metadata and per-partition files. Each rank can then load only its own partition.

参数:

directory (str or PathLike) -- Output directory path.
num_partitions (int) -- Number of partitions to create.
partition_method (str) -- 'simple', 'metis', or 'geometric'.
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning.
verbose (bool) -- Print progress.

示例

>>> A.save_distributed("matrix_dist", num_partitions=4)
# Each rank loads its partition:
>>> partition = DSparseMatrix.load("matrix_dist", rank)

SparseTensorList¶

不同稀疏模式的多个稀疏矩阵容器。适用于异构图的批量操作。

class torch_sla.SparseTensorList(tensors: List[SparseTensor])[源代码]¶

基类：object

A list of SparseTensors with different structures.

Provides a unified interface for batch operations on matrices with different sparsity patterns. Unlike batched SparseTensor (which requires same structure), SparseTensorList allows each matrix to have different shape and sparsity pattern.

tensorsList[SparseTensor]: List of SparseTensor objects.

shapes¶

List of shapes for each tensor.

Type:: List[Tuple[int, ...]]

device¶

Device (from first tensor).

Type:: torch.device

dtype¶

Data type (from first tensor).

Type:: torch.dtype

示例

>>> # Create matrices with different sizes
>>> A1 = SparseTensor(val1, row1, col1, (10, 10))
>>> A2 = SparseTensor(val2, row2, col2, (20, 20))
>>> A3 = SparseTensor(val3, row3, col3, (30, 30))

>>> # Create list
>>> matrices = SparseTensorList([A1, A2, A3])
>>> print(matrices.shapes)  # [(10, 10), (20, 20), (30, 30)]

>>> # Batch solve
>>> x_list = matrices.solve([b1, b2, b3])

>>> # Check properties for all
>>> is_sym = matrices.is_symmetric()  # [tensor(True), tensor(True), tensor(True)]

classmethod from_coo_list(matrices: List[Tuple[Tensor, Tensor, Tensor, Tuple[int, ...]]]) → SparseTensorList[源代码]¶

Create from list of COO data tuples.

参数:: matrices (List[Tuple]) -- List of (values, row_indices, col_indices, shape) tuples.
返回:: List of SparseTensors.
返回类型:: SparseTensorList

示例

>>> data = [
...     (val1, row1, col1, (10, 10)),
...     (val2, row2, col2, (20, 20)),
... ]
>>> matrices = SparseTensorList.from_coo_list(data)

classmethod from_torch_sparse_list(A_list: List[Tensor]) → SparseTensorList[源代码]¶

Create from list of PyTorch sparse tensors.

参数:: A_list (List[torch.Tensor]) -- List of PyTorch sparse COO tensors.
返回:: List of SparseTensors.
返回类型:: SparseTensorList

property shapes: List[Tuple[int, ...]]¶: List of shapes for each tensor.

property device: device¶: Device of the first tensor.

property dtype: dtype¶: Data type of the first tensor.

to(device: str | device) → SparseTensorList[源代码]¶

Move all tensors to device.

参数:: device (str or torch.device) -- Target device.
返回:: New list with tensors on target device.
返回类型:: SparseTensorList

cuda() → SparseTensorList[源代码]¶: Move all tensors to CUDA.

cpu() → SparseTensorList[源代码]¶: Move all tensors to CPU.

sum(axis: int | None = None) → List[Tensor] | Tensor[源代码]¶

Sum values in each matrix.

参数:: axis (int, optional) -- If None: sum all values in each matrix, return List[scalar]. If 0: sum over rows for each matrix. If 1: sum over columns for each matrix.
返回:: If axis is None: List of scalar tensors (one per matrix). If axis is 0 or 1: List of 1D tensors.
返回类型:: List[torch.Tensor] or torch.Tensor

示例

>>> matrices = SparseTensorList([A1, A2, A3])
>>> totals = matrices.sum()  # [sum(A1), sum(A2), sum(A3)]
>>> row_sums = matrices.sum(axis=1)  # [A1.sum(1), A2.sum(1), ...]

mean(axis: int | None = None) → List[Tensor][源代码]¶

Mean of values in each matrix.

参数:: axis (int, optional) -- Same as sum().
返回:: List of mean values/vectors.
返回类型:: List[torch.Tensor]

max() → List[Tensor][源代码]¶: Maximum value in each matrix.

min() → List[Tensor][源代码]¶: Minimum value in each matrix.

abs() → SparseTensorList[源代码]¶: Absolute value of all elements.

clamp(min: float | None = None, max: float | None = None) → SparseTensorList[源代码]¶: Clamp values in all matrices.

pow(exponent: float) → SparseTensorList[源代码]¶: Element-wise power.

sqrt() → SparseTensorList[源代码]¶: Element-wise square root.

exp() → SparseTensorList[源代码]¶: Element-wise exponential.

log() → SparseTensorList[源代码]¶: Element-wise natural logarithm.

solve(b_list: List[Tensor], **kwargs) → List[Tensor][源代码]¶

Solve linear systems for all matrices.

参数:

b_list (List[torch.Tensor]) -- List of right-hand side vectors, one per matrix.
**kwargs -- Additional arguments passed to SparseTensor.solve().

返回:

List of solutions.

返回类型:

List[torch.Tensor]

示例

>>> matrices = SparseTensorList([A1, A2, A3])
>>> x_list = matrices.solve([b1, b2, b3])

is_symmetric(**kwargs) → List[Tensor][源代码]¶

Check symmetry for all matrices.

参数:: **kwargs -- Arguments passed to SparseTensor.is_symmetric().
返回:: List of boolean tensors.
返回类型:: List[torch.Tensor]

is_positive_definite(**kwargs) → List[Tensor][源代码]¶

Check positive definiteness for all matrices.

参数:: **kwargs -- Arguments passed to SparseTensor.is_positive_definite().
返回:: List of boolean tensors.
返回类型:: List[torch.Tensor]

norm(ord: Literal['fro', 1, 2] = 'fro') → List[Tensor][源代码]¶

Compute norms for all matrices.

参数:: ord ({'fro', 1, 2}) -- Norm type.
返回:: List of norm values.
返回类型:: List[torch.Tensor]

eigs(k: int = 6, **kwargs) → List[Tuple[Tensor, Tensor | None]][源代码]¶

Compute eigenvalues for all matrices.

参数:

k (int) -- Number of eigenvalues.
**kwargs -- Additional arguments.

返回:

List of (eigenvalues, eigenvectors) tuples.

返回类型:

List[Tuple[torch.Tensor, Optional[torch.Tensor]]]

eigsh(k: int = 6, **kwargs) → List[Tuple[Tensor, Tensor | None]][源代码]¶

Compute eigenvalues for symmetric matrices.

参数:

k (int) -- Number of eigenvalues.
**kwargs -- Additional arguments.

返回:

List of (eigenvalues, eigenvectors) tuples.

返回类型:

List[Tuple[torch.Tensor, Optional[torch.Tensor]]]

svd(k: int = 6) → List[Tuple[Tensor, Tensor, Tensor]][源代码]¶

Compute SVD for all matrices.

参数:: k (int) -- Number of singular values.
返回:: List of (U, S, Vt) tuples.
返回类型:: List[Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]

condition_number(ord: int = 2) → List[Tensor][源代码]¶

Compute condition numbers for all matrices.

参数:: ord (int) -- Norm order.
返回:: List of condition numbers.
返回类型:: List[torch.Tensor]

det() → List[Tensor][源代码]¶

Compute determinants for all matrices.

返回:: List of determinant values.
返回类型:: List[torch.Tensor]

示例

>>> matrices = SparseTensorList([A1, A2, A3])
>>> dets = matrices.det()
>>> print([d.item() for d in dets])

spy(indices: List[int] | None = None, ncols: int = 3, figsize: Tuple[float, float] | None = None, **kwargs)[源代码]¶

Visualize sparsity patterns for multiple matrices in a grid.

参数:

indices (List[int], optional) -- Which matrices to visualize. Default: all.
ncols (int, optional) -- Number of columns in subplot grid. Default: 3.
figsize (Tuple[float, float], optional) -- Figure size. Auto-computed if None.
**kwargs -- Additional arguments passed to SparseTensor.spy().

返回:

fig -- The figure object.

返回类型:

matplotlib.figure.Figure

示例

>>> matrices = SparseTensorList([A1, A2, A3, A4])
>>> matrices.spy()  # Visualize all in grid
>>> matrices.spy(indices=[0, 2])  # Visualize specific ones

to_block_diagonal() → SparseTensor[源代码]¶

Merge all matrices into a single block-diagonal SparseTensor.

Creates a sparse matrix where each input matrix appears as a block on the diagonal: diag(A1, A2, ..., An).

返回:: Block-diagonal matrix with shape (sum(M_i), sum(N_i)).
返回类型:: SparseTensor

备注

The resulting matrix has the structure:

` [A1 0 0 ...] [ 0 A2 0 ...] [ 0 0 A3 ...] [... ... ... ] `

示例

>>> A1 = SparseTensor(val1, row1, col1, (10, 10))
>>> A2 = SparseTensor(val2, row2, col2, (20, 20))
>>> stl = SparseTensorList([A1, A2])
>>> A_block = stl.to_block_diagonal()  # Shape (30, 30)

classmethod from_block_diagonal(sparse: SparseTensor, sizes: List[Tuple[int, int]]) → SparseTensorList[源代码]¶

Split a block-diagonal SparseTensor into a list of matrices.

参数:

sparse (SparseTensor) -- Block-diagonal matrix to split.
sizes (List[Tuple[int, int]]) -- List of (rows, cols) for each block. Must sum to sparse.shape.

返回:

List of extracted blocks.

返回类型:

SparseTensorList

示例

>>> A_block = SparseTensor(val, row, col, (30, 30))
>>> stl = SparseTensorList.from_block_diagonal(A_block, [(10, 10), (20, 20)])
>>> print(len(stl))  # 2

property block_sizes: List[Tuple[int, int]]¶

Get the (rows, cols) size of each matrix.

返回:: List of (M, N) tuples.
返回类型:: List[Tuple[int, int]]

property total_nnz: int¶: Total number of non-zeros across all matrices.

property total_shape: Tuple[int, int]¶: Shape of the block-diagonal representation.

partition(num_partitions: int, threshold: int = 1000, partition_method: str = 'auto', device: str | device | None = None, verbose: bool = False) → DSparseTensorList[源代码]¶

Create distributed version for parallel computing.

参数:

num_partitions (int) -- Number of partitions (typically = world_size).
threshold (int) -- Graphs with nodes >= threshold are partitioned across ranks. Smaller graphs are assigned whole to individual ranks.
partition_method (str) -- Method for partitioning large graphs: 'metis', 'simple', 'auto'.
device (torch.device, optional) -- Target device.
verbose (bool) -- Print partition info.

返回:

Distributed sparse tensor list.

返回类型:

DSparseTensorList

备注

Hybrid Strategy:

Small graphs (< threshold nodes): Assigned whole to ranks round-robin. Zero edge cuts, no halo exchange needed.
Large graphs (>= threshold nodes): Partitioned across all ranks. Uses halo exchange for boundary nodes.

This is optimal for molecular datasets where most molecules are small but some (proteins, polymers) can be very large.

示例

>>> stl = SparseTensorList([A1, A2, A3, ...])
>>> dstl = stl.partition(num_partitions=4, threshold=1000)
>>> y_list = dstl @ x_list  # Distributed matmul

LUFactorization¶

LU 分解，用于相同矩阵的高效重复求解。

class torch_sla.LUFactorization(lu_factor, shape: Tuple[int, int], dtype: dtype, device: device)[源代码]¶

基类：object

LU factorization wrapper for efficient repeated solves.

Created by SparseTensor.lu().

参数:

lu_factor (scipy.sparse.linalg.SuperLU) -- The SciPy LU factorization object.
shape (Tuple[int, int]) -- Matrix shape.
dtype (torch.dtype) -- Data type.
device (torch.device) -- Device.

示例

>>> A = SparseTensor(val, row, col, (10, 10))
>>> lu = A.lu()
>>> x1 = lu.solve(b1)  # First solve
>>> x2 = lu.solve(b2)  # Much faster - reuses factorization

solve(b: Tensor) → Tensor[源代码]¶

Solve Ax = b using the cached factorization.

参数:: b (torch.Tensor) -- Right-hand side vector.
返回:: Solution x.
返回类型:: torch.Tensor

分布式类¶

DSparseTensor¶

支持域分解的分布式稀疏张量。使用 halo 交换进行分区间通信。

class torch_sla.DSparseTensor(values: Tensor, row_indices: Tensor, col_indices: Tensor, shape: Tuple[int, int], num_partitions: int, coords: Tensor | None = None, partition_method: str = 'auto', device: str | device | None = None, verbose: bool = True)[源代码]¶

基类：object

Distributed Sparse Tensor with automatic partitioning and halo exchange.

A Pythonic wrapper that provides a unified interface for distributed sparse matrix operations. Supports indexing to access individual partitions.

参数:

values (torch.Tensor) -- Non-zero values [nnz]
row_indices (torch.Tensor) -- Row indices [nnz]
col_indices (torch.Tensor) -- Column indices [nnz]
shape (Tuple[int, int]) -- Matrix shape (m, n)
num_partitions (int) -- Number of partitions to create
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning [num_nodes, dim]
partition_method (str) -- Partitioning method: 'metis', 'rcb', 'slicing', 'simple'
device (str or torch.device) -- Device for the matrix data
verbose (bool) -- Whether to print partition info

示例

>>> import torch
>>> from torch_sla import DSparseTensor
>>>
>>> # Create distributed tensor with 4 partitions
>>> A = DSparseTensor(val, row, col, shape, num_partitions=4)
>>>
>>> # Access individual partitions
>>> A0 = A[0]  # First partition
>>> A1 = A[1]  # Second partition
>>>
>>> # Iterate over partitions
>>> for partition in A:
>>>     x = partition.solve(b_local)
>>>
>>> # Properties
>>> print(A.num_partitions)  # 4
>>> print(A.shape)           # Global shape
>>> print(len(A))            # 4
>>>
>>> # Move to CUDA
>>> A_cuda = A.cuda()
>>>
>>> # Local halo exchange (for testing)
>>> x_list = [torch.zeros(A[i].num_local) for i in range(4)]
>>> A.halo_exchange_local(x_list)

classmethod from_sparse_tensor(sparse_tensor: SparseTensor, num_partitions: int, coords: Tensor | None = None, partition_method: str = 'auto', device: str | device | None = None, verbose: bool = True) → DSparseTensor[源代码]¶

Create DSparseTensor from a SparseTensor.

参数:

sparse_tensor (SparseTensor) -- Input sparse tensor (must be 2D, not batched)
num_partitions (int) -- Number of partitions
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning
partition_method (str) -- Partitioning method
device (str or torch.device, optional) -- Target device (defaults to sparse_tensor's device)
verbose (bool) -- Whether to print partition info

返回:

Distributed sparse tensor

返回类型:

classmethod from_torch_sparse(A: Tensor, num_partitions: int, **kwargs) → DSparseTensor[源代码]¶: Create DSparseTensor from PyTorch sparse tensor.

classmethod from_global_distributed(values: Tensor, row_indices: Tensor, col_indices: Tensor, shape: Tuple[int, int], rank: int, world_size: int, coords: Tensor | None = None, partition_method: str = 'auto', device: str | device | None = None, verbose: bool = True) → DSparseMatrix[源代码]¶

Create local partition in a distributed-safe manner.

This method ensures that all ranks compute the same partition assignment by having rank 0 compute the partition IDs and broadcasting to all ranks.

参数:

values (torch.Tensor) -- Global non-zero values [nnz]
row_indices (torch.Tensor) -- Global row indices [nnz]
col_indices (torch.Tensor) -- Global column indices [nnz]
shape (Tuple[int, int]) -- Global matrix shape (M, N)
rank (int) -- Current process rank
world_size (int) -- Total number of processes
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning [num_nodes, dim]
partition_method (str) -- Partitioning method: 'metis', 'rcb', 'slicing', 'simple'
device (str or torch.device, optional) -- Target device
verbose (bool) -- Whether to print partition info

返回:

Local partition matrix for this rank

返回类型:

示例

>>> import torch.distributed as dist
>>>
>>> # In each process:
>>> rank = dist.get_rank()
>>> world_size = dist.get_world_size()
>>>
>>> local_matrix = DSparseTensor.from_global_distributed(
...     val, row, col, shape,
...     rank=rank, world_size=world_size
... )

classmethod from_device_mesh(values: Tensor, row_indices: Tensor, col_indices: Tensor, shape: Tuple[int, int], device_mesh: DeviceMesh, coords: Tensor | None = None, partition_method: str = 'simple', placement: str = 'shard_rows', verbose: bool = False) → DSparseMatrix[源代码]¶

Create local partition using PyTorch DeviceMesh.

This is the recommended method for distributed training with PyTorch's DTensor ecosystem. Each rank receives only its local partition.

参数:

values (torch.Tensor) -- Global non-zero values [nnz] (same on all ranks)
row_indices (torch.Tensor) -- Global row indices [nnz]
col_indices (torch.Tensor) -- Global column indices [nnz]
shape (Tuple[int, int]) -- Global matrix shape (M, N)
device_mesh (DeviceMesh) -- PyTorch DeviceMesh specifying device topology
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning
partition_method (str) -- Partitioning method: 'metis', 'rcb', 'simple' Default is 'simple' for determinism in distributed setting
placement (str) -- How to distribute: 'shard_rows', 'shard_cols', 'replicate'
verbose (bool) -- Whether to print partition info

返回:

Local partition for this rank

返回类型:

示例

>>> from torch.distributed.device_mesh import init_device_mesh
>>> from torch_sla import DSparseTensor
>>>
>>> # Initialize 4-GPU device mesh
>>> mesh = init_device_mesh("cuda", (4,), mesh_dim_names=("dp",))
>>>
>>> # Create distributed sparse tensor (each rank gets its partition)
>>> local_matrix = DSparseTensor.from_device_mesh(
...     val, row, col, shape,
...     device_mesh=mesh,
...     partition_method='simple'
... )
>>>
>>> # Local operations
>>> y_local = local_matrix.matvec(x_local)
>>> x_local = local_matrix.solve(b_local)

property shape: Tuple[int, int]¶: Global matrix shape.

property num_partitions: int¶: Number of partitions.

property device: device¶: Device of the matrix data.

property dtype: dtype¶: Data type of matrix values.

property nnz: int¶: Total number of non-zeros.

property partition_ids: Tensor¶: Partition assignment for each node.

property is_cuda: bool¶: Check if matrix is on CUDA.

to(device: str | device) → DSparseTensor[源代码]¶

Move all partitions to a different device.

参数:: device (str or torch.device) -- Target device
返回:: New distributed tensor on target device
返回类型:: DSparseTensor

cuda(device: int | None = None) → DSparseTensor[源代码]¶: Move to CUDA device.

cpu() → DSparseTensor[源代码]¶: Move to CPU.

halo_exchange_local(x_list: List[Tensor]) → None[源代码]¶

Local halo exchange for single-process simulation.

Exchanges halo values between all partitions locally. Useful for testing without actual distributed setup.

参数:: x_list (List[torch.Tensor]) -- List of local vectors, one per partition. Each vector is modified in-place to update halo values.

matvec_all(x_list: List[Tensor], exchange_halo: bool = True) → List[Tensor][源代码]¶

Matrix-vector multiply on all partitions.

Performs y = A @ x for each partition, with optional halo exchange.

参数:

x_list (List[torch.Tensor]) -- List of local vectors, one per partition. Each vector should have size = num_owned + num_halo for that partition.
exchange_halo (bool) -- Whether to perform halo exchange before multiplication. Default True.

返回:

List of result vectors, one per partition. Each result has size = num_owned (only owned nodes have valid results).

返回类型:

List[torch.Tensor]

示例

>>> D = SparseTensor(val, row, col, shape).partition(4)
>>> x_local = D.scatter_local(x_global)
>>> y_local = D.matvec_all(x_local)
>>> y_global = D.gather_global(y_local)

solve_all(b_list: List[Tensor], **kwargs) → List[Tensor][源代码]¶

Solve on all partitions (subdomain solves).

NOTE: This performs LOCAL subdomain solves, NOT a global distributed solve. Each partition solves its own local system independently. For a true distributed solve, use solve_distributed().

参数:

b_list (List[torch.Tensor]) -- List of local RHS vectors, one per partition
**kwargs -- Additional arguments passed to each partition's solve method

返回:

List of solution vectors, one per partition

返回类型:

List[torch.Tensor]

solve_distributed(b_global: Tensor | DTensor, method: str = 'cg', atol: float = 1e-10, maxiter: int = 1000, verbose: bool = False) → Tensor | DTensor[源代码]¶

Distributed solve: find x such that A @ x = b using all partitions.

This performs a TRUE distributed solve where all partitions collaborate to solve the global system. Uses distributed CG with global reductions.

参数:

b_global (torch.Tensor or DTensor) -- Global RHS vector [N]. - If torch.Tensor: treated as global vector - If DTensor: automatically handles distributed input/output
method (str) -- Solver method: 'cg' (Conjugate Gradient)
atol (float) -- Absolute tolerance for convergence
maxiter (int) -- Maximum iterations
verbose (bool) -- Print convergence info

返回:

Global solution vector [N]. Returns DTensor if input is DTensor, otherwise torch.Tensor.

返回类型:

torch.Tensor or DTensor

示例

>>> D = A.partition(num_partitions=4)
>>> x = D.solve_distributed(b)  # Distributed CG solve
>>> residual = torch.norm(A @ x - b)

>>> # With DTensor input
>>> from torch.distributed.tensor import DTensor, Replicate
>>> b_dt = DTensor.from_local(b_local, mesh, [Replicate()])
>>> x_dt = D.solve_distributed(b_dt)  # Returns DTensor

gather_global(x_list: List[Tensor]) → Tensor[源代码]¶

Gather local vectors to global vector.

参数:: x_list (List[torch.Tensor]) -- List of local vectors, one per partition
返回:: Global vector
返回类型:: torch.Tensor

scatter_local(x_global: Tensor) → List[Tensor][源代码]¶

Scatter global vector to local vectors.

参数:: x_global (torch.Tensor) -- Global vector
返回:: List of local vectors (with halo values filled)
返回类型:: List[torch.Tensor]

to_sparse_tensor() → SparseTensor[源代码]¶

Gather all partitions into a single SparseTensor.

This creates a global SparseTensor from the distributed data. Useful for verification, debugging, or when you need to perform operations that require the full matrix.

返回:: Global sparse tensor containing all data
返回类型:: SparseTensor

示例

>>> D = DSparseTensor(val, row, col, shape, num_partitions=4)
>>> A = D.to_sparse_tensor()  # Gather to global SparseTensor
>>> x = A.solve(b)  # Solve on the full matrix

gather() → SparseTensor¶

Gather all partitions into a single SparseTensor.

This creates a global SparseTensor from the distributed data. Useful for verification, debugging, or when you need to perform operations that require the full matrix.

返回:: Global sparse tensor containing all data
返回类型:: SparseTensor

示例

>>> D = DSparseTensor(val, row, col, shape, num_partitions=4)
>>> A = D.to_sparse_tensor()  # Gather to global SparseTensor
>>> x = A.solve(b)  # Solve on the full matrix

to_list() → DSparseTensorList[源代码]¶

Split into DSparseTensorList based on connected components.

If the matrix has isolated subgraphs (block-diagonal structure), splits it into separate distributed matrices, one per component.

返回:: List of distributed matrices, one per connected component.
返回类型:: DSparseTensorList

备注

This is useful when you have a block-diagonal matrix representing multiple independent graphs and want to process them separately.

示例

>>> D = DSparseTensor(val, row, col, shape, num_partitions=4)
>>> if D.has_isolated_components():
...     dstl = D.to_list()  # Split into components

has_isolated_components() → bool[源代码]¶

Check if the matrix has multiple connected components.

返回:: True if matrix has more than one connected component.
返回类型:: bool

classmethod from_list(dstl: DSparseTensorList, verbose: bool = False) → DSparseTensor[源代码]¶

Merge DSparseTensorList into a single block-diagonal DSparseTensor.

参数:

dstl (DSparseTensorList) -- List of distributed matrices to merge.
verbose (bool) -- Print info.

返回:

Block-diagonal distributed matrix.

返回类型:

示例

>>> dstl = DSparseTensorList.from_sparse_tensor_list(stl, 4)
>>> D = DSparseTensor.from_list(dstl)  # Merge to block-diagonal

scatter_to_dtensor(x_global: Tensor, device_mesh: DeviceMesh, shard_dim: int = 0) → DTensor[源代码]¶

Convert a global tensor to a sharded DTensor aligned with matrix partitioning.

This creates a DTensor where each rank holds the portion of the vector corresponding to its owned nodes in the matrix partitioning.

参数:

x_global (torch.Tensor) -- Global vector of shape [N]
device_mesh (DeviceMesh) -- PyTorch DeviceMesh for distribution
shard_dim (int) -- Dimension to shard (default 0 for vectors)

返回:

Sharded DTensor with local data for this rank

返回类型:

DTensor

示例

>>> mesh = init_device_mesh("cuda", (4,))
>>> x_global = torch.randn(N)
>>> x_dt = D.scatter_to_dtensor(x_global, mesh)

gather_from_dtensor(x_dtensor: DTensor) → Tensor[源代码]¶

Convert a DTensor to a global tensor.

参数:: x_dtensor (DTensor) -- Distributed tensor
返回:: Full global tensor
返回类型:: torch.Tensor

示例

>>> x_global = D.gather_from_dtensor(x_dt)

to_dtensor(x: Tensor, device_mesh: DeviceMesh, replicate: bool = True) → DTensor[源代码]¶

Convert a tensor to DTensor with specified placement.

参数:

x (torch.Tensor) -- Input tensor
device_mesh (DeviceMesh) -- PyTorch DeviceMesh
replicate (bool) -- If True, create a replicated DTensor (same data on all ranks). If False, create a sharded DTensor (data is split).

返回:

Resulting DTensor

返回类型:

DTensor

示例

>>> mesh = init_device_mesh("cuda", (4,))
>>> x_dt = D.to_dtensor(x, mesh, replicate=True)

property supports_dtensor: bool¶: Check if DTensor operations are available.

eigsh(k: int = 6, which: str = 'LM', sigma: float | None = None, return_eigenvectors: bool = True, maxiter: int = 1000, tol: float = 1e-08) → Tuple[Tensor, Tensor | None][源代码]¶

Compute k eigenvalues for symmetric matrices using distributed LOBPCG.

This is a TRUE distributed algorithm - no data gather required. Uses distributed matvec with global QR decomposition.

参数:

k (int, optional) -- Number of eigenvalues to compute. Default: 6.
which ({"LM", "SM", "LA", "SA"}, optional) -- Which eigenvalues to find: - "LM"/"LA": Largest (default) - "SM"/"SA": Smallest
sigma (float, optional) -- Find eigenvalues near sigma (not yet supported).
return_eigenvectors (bool, optional) -- Whether to return eigenvectors. Default: True.
maxiter (int, optional) -- Maximum LOBPCG iterations. Default: 1000.
tol (float, optional) -- Convergence tolerance. Default: 1e-8.

返回:

eigenvalues (torch.Tensor) -- Shape [k].
eigenvectors (torch.Tensor or None) -- Shape [N, k] if return_eigenvectors is True.

备注

Distributed Algorithm:

Uses distributed LOBPCG (Locally Optimal Block PCG)
Only requires distributed matvec + global reductions
Memory: O(N * k) per node for eigenvectors
Communication: O(k^2) per iteration for Rayleigh-Ritz

Gradient Support:

Gradients flow through the distributed matvec operations
O(iterations) graph nodes (not O(1) like adjoint)

eigs(k: int = 6, which: str = 'LM', sigma: float | None = None, return_eigenvectors: bool = True, maxiter: int = 1000, tol: float = 1e-08) → Tuple[Tensor, Tensor | None][源代码]¶

Compute k eigenvalues using distributed LOBPCG.

For symmetric matrices, equivalent to eigsh(). For non-symmetric, currently falls back to eigsh() (symmetric assumption).

参数:

k (int, optional) -- Number of eigenvalues to compute. Default: 6.
which (str, optional) -- Which eigenvalues to find.
sigma (float, optional) -- Find eigenvalues near sigma.
return_eigenvectors (bool, optional) -- Whether to return eigenvectors. Default: True.
maxiter (int, optional) -- Maximum iterations. Default: 1000.
tol (float, optional) -- Convergence tolerance. Default: 1e-8.

返回:

eigenvalues (torch.Tensor) -- Shape [k].
eigenvectors (torch.Tensor or None) -- Shape [N, k] if return_eigenvectors is True.

svd(k: int = 6, maxiter: int = 1000, tol: float = 1e-08) → Tuple[Tensor, Tensor, Tensor][源代码]¶

Compute truncated SVD using distributed power iteration.

Uses A^T @ A for eigenvalues, then recovers U from A @ V.

参数:

k (int, optional) -- Number of singular values to compute. Default: 6.
maxiter (int, optional) -- Maximum iterations. Default: 1000.
tol (float, optional) -- Convergence tolerance. Default: 1e-8.

返回:

U (torch.Tensor) -- Left singular vectors. Shape [M, k].
S (torch.Tensor) -- Singular values. Shape [k].
Vt (torch.Tensor) -- Right singular vectors. Shape [k, N].

备注

Distributed Algorithm:

Computes eigenvalues of A^T @ A using distributed LOBPCG
No data gather required

norm(ord: Literal['fro', 1, 2] = 'fro') → Tensor[源代码]¶

Compute matrix norm (distributed).

For Frobenius norm, computed locally and aggregated. For spectral norm, uses distributed SVD.

参数:: ord ({'fro', 1, 2}) -- Type of norm: - 'fro': Frobenius norm (distributed sum) - 1: Maximum column sum - 2: Spectral norm (largest singular value via distributed SVD)
返回:: Scalar tensor containing the norm value.
返回类型:: torch.Tensor

condition_number(ord: int = 2) → Tensor[源代码]¶

Estimate condition number using distributed SVD.

参数:: ord (int, optional) -- Norm order. Default: 2 (spectral).
返回:: Condition number estimate (σ_max / σ_min).
返回类型:: torch.Tensor

det() → Tensor[源代码]¶

Compute determinant of the distributed sparse matrix.

WARNING: This operation requires gathering the full matrix to compute the determinant, as determinant is a global property that cannot be computed in a truly distributed manner without full matrix information.

The determinant is computed by: 1. Gathering all partitions into a global SparseTensor 2. Computing the determinant using LU decomposition (CPU) or

torch.linalg.det (CUDA)

返回:: Determinant value (scalar tensor).
返回类型:: torch.Tensor
抛出:: ValueError -- If matrix is not square

备注

Only square matrices have determinants
This method gathers all data, so use with caution for large matrices
Supports gradient computation via autograd
For very large matrices, consider using log-determinant or other approximations instead

示例

>>> import torch
>>> from torch_sla import DSparseTensor
>>>
>>> # Create distributed sparse matrix
>>> val = torch.tensor([4.0, -1.0, -1.0, 4.0, -1.0, -1.0, 4.0])
>>> row = torch.tensor([0, 0, 1, 1, 1, 2, 2])
>>> col = torch.tensor([0, 1, 0, 1, 2, 1, 2])
>>> D = DSparseTensor(val, row, col, (3, 3), num_partitions=2)
>>>
>>> # Compute determinant (gathers to single node)
>>> det = D.det()
>>> print(det)
>>>
>>> # With gradient support
>>> val = val.requires_grad_(True)
>>> D = DSparseTensor(val, row, col, (3, 3), num_partitions=2)
>>> det = D.det()
>>> det.backward()
>>> print(val.grad)  # Gradient w.r.t. matrix values

T() → DSparseTensor[源代码]¶

Transpose the distributed sparse tensor.

Returns a new DSparseTensor with swapped row/column indices.

返回:: Transposed matrix.
返回类型:: DSparseTensor

to_dense() → Tensor[源代码]¶

Convert to dense tensor.

WARNING: This gathers all data to a single node. Only use for small matrices or debugging.

返回:: Dense matrix of shape (M, N).
返回类型:: torch.Tensor

is_symmetric(atol: float = 1e-08, rtol: float = 1e-05) → Tensor[源代码]¶

Check if matrix is symmetric.

Can be done distributedly by comparing values with transpose.

参数:

atol (float) -- Absolute tolerance for symmetry check.
rtol (float) -- Relative tolerance for symmetry check.

返回:

Boolean scalar tensor.

返回类型:

is_positive_definite() → Tensor[源代码]¶

Check if matrix is positive definite.

Uses distributed eigenvalue computation.

返回:: Boolean scalar tensor.
返回类型:: torch.Tensor

lu()[源代码]¶

Compute LU decomposition.

WARNING: LU is inherently not distributed-friendly. This gathers data to a single node.

For distributed solves, use solve_distributed() with iterative methods.

返回:: Factorization object with solve() method.
返回类型:: LUFactorization

spy(**kwargs)[源代码]¶

Visualize sparsity pattern.

Gathers data for visualization.

参数:: **kwargs -- Arguments passed to SparseTensor.spy().

nonlinear_solve(residual_fn, u0: Tensor, *params, method: str = 'newton', tol: float = 1e-06, atol: float = 1e-10, max_iter: int = 50, line_search: bool = True, verbose: bool = False) → Tensor[源代码]¶

Solve nonlinear equation F(u, D, *params) = 0 using distributed Newton-Krylov.

Uses Jacobian-free Newton-Krylov with distributed CG for linear solves.

参数:

residual_fn (callable) -- Function F(u, D, *params) -> residual tensor. D is this DSparseTensor.
u0 (torch.Tensor) -- Initial guess (global vector).
*params (torch.Tensor) -- Additional parameters.
method (str) -- 'newton': Newton-Krylov with distributed CG 'picard': Fixed-point iteration
tol (float) -- Relative tolerance.
atol (float) -- Absolute tolerance.
max_iter (int) -- Maximum outer iterations.
line_search (bool) -- Use Armijo line search.
verbose (bool) -- Print convergence info.

返回:

Solution u such that F(u, D, *params) ≈ 0.

返回类型:

备注

Distributed Algorithm:

Uses Jacobian-free Newton-Krylov (JFNK)
Linear solves use distributed CG
Jacobian-vector products computed via finite differences

save(directory: str | PathLike, verbose: bool = False) → None[源代码]¶

Save DSparseTensor to disk.

Creates a directory with metadata and per-partition files.

参数:

directory (str or PathLike) -- Output directory.
verbose (bool) -- Print progress.

示例

>>> D = A.partition(num_partitions=4)
>>> D.save("matrix_dist")

classmethod load(directory: str | PathLike, device: str | device = 'cpu') → DSparseTensor[源代码]¶

Load a complete DSparseTensor from disk.

参数:

directory (str or PathLike) -- Directory containing saved data.
device (str or torch.device) -- Device to load to.

返回:

The loaded distributed sparse tensor.

返回类型:

示例

>>> D = DSparseTensor.load("matrix_dist", device="cuda")

DSparseMatrix¶

为大规模 CFD/FEM 计算设计的分布式稀疏矩阵。提供带 halo 交换的域分解。

class torch_sla.DSparseMatrix(partition: Partition, local_values: Tensor, local_row: Tensor, local_col: Tensor, local_shape: Tuple[int, int], global_shape: Tuple[int, int], num_partitions: int, device: str | device = 'cpu', verbose: bool = True)[源代码]¶

基类：object

Distributed Sparse Matrix with halo exchange support.

Designed for large-scale CFD/FEM computations following industrial practices from Ansys, OpenFOAM, etc.

The matrix is partitioned across multiple processes/GPUs, with automatic halo (ghost) node management for parallel iterative solvers.

Supports both CPU and CUDA devices.

partition¶

Local partition information

Type:: Partition

local_values¶

Non-zero values for local portion of matrix

Type:: torch.Tensor

local_row¶

Local row indices

Type:: torch.Tensor

local_col¶

Local column indices

Type:: torch.Tensor

local_shape¶

Shape of local matrix (including halo)

Type:: Tuple[int, int]

global_shape¶

Shape of global matrix

Type:: Tuple[int, int]

device¶

Device where the matrix data resides (cpu or cuda)

Type:: torch.device

示例

>>> # Create distributed matrix on CPU
>>> A = DSparseMatrix.from_global(val, row, col, shape, num_parts=4, my_part=0, device='cpu')
>>>
>>> # Create distributed matrix on CUDA
>>> A_cuda = DSparseMatrix.from_global(val, row, col, shape, num_parts=4, my_part=0, device='cuda')
>>>
>>> # Distributed matrix-vector product with halo exchange
>>> y = A.matvec(x)  # Automatically handles halo exchange
>>>
>>> # Explicit halo exchange
>>> A.halo_exchange(x)  # Update halo values in x

to(device: str | device) → DSparseMatrix[源代码]¶

Move the distributed matrix to a different device.

参数:: device (str or torch.device) -- Target device ('cpu', 'cuda', 'cuda:0', etc.)
返回:: New distributed matrix on the target device
返回类型:: DSparseMatrix

cuda(device: int | None = None) → DSparseMatrix[源代码]¶: Move to CUDA device

cpu() → DSparseMatrix[源代码]¶: Move to CPU

property is_cuda: bool¶: Check if matrix is on CUDA

classmethod from_global(values: Tensor, row: Tensor, col: Tensor, shape: Tuple[int, int], num_partitions: int, my_partition: int, partition_ids: Tensor | None = None, coords: Tensor | None = None, device: str | device = 'cpu', verbose: bool = True) → DSparseMatrix[源代码]¶

Create distributed matrix from global COO data.

参数:

values (torch.Tensor) -- Global COO sparse matrix data
row (torch.Tensor) -- Global COO sparse matrix data
col (torch.Tensor) -- Global COO sparse matrix data
shape (Tuple[int, int]) -- Global matrix shape
num_partitions (int) -- Number of partitions
my_partition (int) -- This process's partition ID (0 to num_partitions-1)
partition_ids (torch.Tensor, optional) -- Pre-computed partition assignments. If None, computed automatically.
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning [num_nodes, dim]
device (str or torch.device) -- Device for local data ('cpu', 'cuda', 'cuda:0', etc.)
verbose (bool) -- Whether to print partition info

返回:

Local portion of the distributed matrix

返回类型:

property num_owned: int¶: Number of owned (non-halo) nodes

property num_halo: int¶: Number of halo/ghost nodes

property num_local: int¶: Total local nodes (owned + halo)

property nnz: int¶: Number of non-zeros in local matrix

property dtype: dtype¶: Data type of matrix values

halo_exchange(x: Tensor, async_op: bool = False) → Tensor | None[源代码]¶

Exchange halo/ghost values with neighbors.

This is the core operation for parallel iterative methods. Updates the halo portion of x with values from neighboring partitions.

参数:

x (torch.Tensor) -- Local vector [num_local] with owned values filled in. Halo values will be updated.
async_op (bool) -- If True, return immediately and return a future.

返回:

x -- Vector with updated halo values (same tensor, modified in-place)

返回类型:

示例

>>> # During iterative solve
>>> for iteration in range(max_iter):
>>>     # Compute local update
>>>     x_new = local_gauss_seidel_step(A_local, x, b)
>>>
>>>     # Exchange boundary values
>>>     A.halo_exchange(x_new)
>>>
>>>     # Check convergence using owned nodes only
>>>     residual = compute_residual(A_local, x_new, b)

halo_exchange_local(x_list: List[Tensor]) → None[源代码]¶

Local halo exchange for single-process multi-partition simulation.

Useful for testing/debugging without actual distributed setup.

参数:: x_list (List[torch.Tensor]) -- List of local vectors, one per partition

matvec(x: Tensor, exchange_halo: bool = True) → Tensor[源代码]¶

Local matrix-vector product y = A_local @ x.

参数:

x (torch.Tensor) -- Local vector [num_local]
exchange_halo (bool) -- If True, perform halo exchange before multiplication

返回:

y -- Result vector [num_local]

返回类型:

matvec_overlap(x: Tensor) → Tensor[源代码]¶

Matrix-vector product with communication-computation overlap.

This optimized version overlaps halo communication with computation: 1. Start async halo exchange 2. Compute interior part (rows that don't depend on halo) 3. Wait for halo exchange to complete 4. Compute boundary part (rows that depend on halo) 5. Combine results

Note: This is only beneficial in true distributed settings where there is actual network latency to hide. In single-process mode, this falls back to regular matvec.

参数:: x (torch.Tensor) -- Local vector [num_local]
返回:: y -- Result vector [num_local]
返回类型:: torch.Tensor

halo_exchange_async(x: Tensor)[源代码]¶

Start asynchronous halo exchange.

Returns a handle that can be passed to _wait_halo_exchange().

solve(b: Tensor, method: str = 'cg', preconditioner: str = 'jacobi', atol: float = 1e-10, rtol: float = 1e-06, maxiter: int = 1000, verbose: bool = False, distributed: bool = True, overlap: bool = False, use_cache: bool = True) → Tensor[源代码]¶

Solve linear system Ax = b.

Optimizations enabled by default: - CSR cache: Avoids repeated COO->CSR conversion (use_cache=True) - Jacobi preconditioner: ~5% speedup for Poisson-like problems

参数:

b (torch.Tensor) -- Right-hand side. Shape [num_owned] for owned nodes only.
method (str) -- Solver method: 'cg' (default), 'jacobi', 'gauss_seidel'
preconditioner (str) -- Preconditioner for CG: 'none', 'jacobi' (default), 'ssor', 'ic0', 'polynomial'
atol (float) -- Absolute tolerance for convergence
rtol (float) -- Relative tolerance for convergence (|r| < rtol * |b|)
maxiter (int) -- Maximum iterations
verbose (bool) -- Print convergence info (rank 0 only for distributed)
distributed (bool, default=True) -- If True (default): Solve the GLOBAL system using distributed algorithms with all_reduce for global dot products. If False: Solve only the LOCAL subdomain problem (useful as preconditioner in domain decomposition methods).
overlap (bool, default=False) -- If True: Overlap communication with computation. Note: Only beneficial for slow interconnects (InfiniBand, Ethernet). For NVLink, synchronous communication is faster.
use_cache (bool, default=True) -- If True (default): Cache CSR format and diagonal for reuse. Provides ~2% speedup and ~27% memory reduction.

返回:

x -- Solution for owned nodes, shape [num_owned]

返回类型:

示例

>>> # Distributed solve (default) - all ranks cooperate
>>> x = local_matrix.solve(b_owned)

>>> # Local subdomain solve - no global communication
>>> x = local_matrix.solve(b_owned, distributed=False)

>>> # With different preconditioner
>>> x = local_matrix.solve(b_owned, preconditioner='ssor')

>>> # Disable caching (for memory-constrained cases)
>>> x = local_matrix.solve(b_owned, use_cache=False)

eigsh(k: int = 6, which: str = 'LM', maxiter: int = 200, tol: float = 1e-08, verbose: bool = False, distributed: bool = True) → Tuple[Tensor, Tensor][源代码]¶

Compute k eigenvalues of symmetric matrix.

参数:

k (int) -- Number of eigenvalues to compute
which (str) -- Which eigenvalues: "LM" (largest magnitude), "SM" (smallest magnitude)
maxiter (int) -- Maximum iterations
tol (float) -- Convergence tolerance
verbose (bool) -- Print convergence info (rank 0 only)
distributed (bool, default=True) -- If True (default): Use distributed LOBPCG with global reductions. If False: Gather to single SparseTensor and compute locally (not recommended for large matrices).

返回:

eigenvalues (torch.Tensor) -- k eigenvalues, shape [k]
eigenvectors_owned (torch.Tensor) -- Eigenvectors for owned nodes only, shape [num_owned, k]

gather_global(x_local: Tensor) → Tensor | None[源代码]¶

Gather local vectors to global vector (on rank 0).

参数:: x_local (torch.Tensor) -- Local vector [num_owned]
返回:: x_global -- Global vector on rank 0, None on other ranks
返回类型:: torch.Tensor or None

det() → Tensor[源代码]¶

Compute determinant of the distributed sparse matrix.

NOTE: DSparseMatrix represents a single partition. To compute the determinant of the full global matrix, you need to use DSparseTensor which manages all partitions, or manually gather all partitions.

This method raises an error to guide users to the correct approach.

抛出:: NotImplementedError -- DSparseMatrix is a single partition. Use DSparseTensor.det() instead.

示例

>>> # Correct way: Use DSparseTensor
>>> from torch_sla import DSparseTensor
>>> D = DSparseTensor(val, row, col, shape, num_partitions=4)
>>> det = D.det()  # This works
>>>
>>> # If you have individual DSparseMatrix partitions, you need to
>>> # reconstruct the global matrix first

classmethod load(directory: str | PathLike, rank: int, world_size: int | None = None, device: str | device = 'cpu') → DSparseMatrix[源代码]¶

Load a partition from disk for the given rank.

Each rank should call this with its own rank to load only its partition.

参数:

directory (str or PathLike) -- Directory containing partitioned data.
rank (int) -- Rank of this process.
world_size (int, optional) -- Total number of processes (must match num_partitions).
device (str or torch.device) -- Device to load tensors to.

返回:

The partition for this rank.

返回类型:

示例

>>> rank = dist.get_rank()
>>> world_size = dist.get_world_size()
>>> partition = DSparseMatrix.load("matrix_dist", rank, world_size, "cuda")

线性求解函数¶

spsolve¶

torch_sla.spsolve(val: Tensor, row: Tensor, col: Tensor, shape: Tuple[int, int], b: Tensor, backend: Literal['scipy', 'eigen', 'pytorch', 'cusolver', 'cudss', 'auto'] = 'auto', method: Literal['auto', 'superlu', 'umfpack', 'lu', 'qr', 'cholesky', 'ldlt', 'cg', 'bicgstab', 'gmres', 'lgmres', 'minres', 'qmr'] = 'auto', atol: float = 1e-10, maxiter: int = 10000, tol: float = 1e-12, matrix_type: str = 'general', is_symmetric: bool = False, is_spd: bool = False, preconditioner: str = 'jacobi', mixed_precision: bool = False) → Tensor[源代码]¶

Solve the Sparse Linear Equation Ax = b with gradient support.

Supports multiple backends for CPU and CUDA tensors.

参数:

val (torch.Tensor) -- [nnz] Non-zero values of sparse matrix A in COO format
row (torch.Tensor) -- [nnz] Row indices
col (torch.Tensor) -- [nnz] Column indices
shape (Tuple[int, int]) -- (m, n) Shape of sparse matrix A
b (torch.Tensor) -- [m] Right-hand side vector
backend (str, optional) -- Backend to use: - 'auto': Auto-select based on device and problem size (default) - 'scipy': SciPy (CPU only, uses SuperLU/UMFPACK) - 'eigen': Eigen C++ (CPU only, iterative) - 'pytorch': PyTorch-native (CPU & CUDA, iterative) - best for large problems - 'cusolver': NVIDIA cuSOLVER (CUDA only, direct) - 'cudss': NVIDIA cuDSS (CUDA only, direct)
method (str, optional) -- Solver method. Available methods depend on backend: - 'auto': Auto-select based on matrix properties - 'superlu', 'umfpack': Direct solvers (scipy) - 'cg', 'bicgstab', 'gmres': Iterative solvers - 'lu', 'qr', 'cholesky', 'ldlt': Direct solvers (CUDA)
atol (float, optional) -- Absolute tolerance for iterative solvers, by default 1e-10
maxiter (int, optional) -- Maximum iterations for iterative solvers, by default 10000
tol (float, optional) -- Tolerance for direct solvers, by default 1e-12
matrix_type (str, optional) -- Matrix type for cuDSS: 'general', 'symmetric', 'spd', by default "general"
is_symmetric (bool, optional) -- Hint that matrix is symmetric (for auto method selection)
is_spd (bool, optional) -- Hint that matrix is symmetric positive definite

返回:

[n] Solution vector x

返回类型:

示例

>>> import torch
>>> from torch_sla import spsolve
>>>
>>> # Create a simple SPD matrix
>>> val = torch.tensor([4.0, -1.0, -1.0, 4.0, -1.0, -1.0, 4.0], dtype=torch.float64)
>>> row = torch.tensor([0, 0, 1, 1, 1, 2, 2], dtype=torch.int64)
>>> col = torch.tensor([0, 1, 0, 1, 2, 1, 2], dtype=torch.int64)
>>> shape = (3, 3)
>>> b = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
>>>
>>> # Auto-select backend and method
>>> x = spsolve(val, row, col, shape, b)
>>>
>>> # Specify backend and method
>>> x = spsolve(val, row, col, shape, b, backend='scipy', method='superlu')
>>>
>>> # On CUDA
>>> val_cuda = val.cuda()
>>> row_cuda = row.cuda()
>>> col_cuda = col.cuda()
>>> b_cuda = b.cuda()
>>> x_cuda = spsolve(val_cuda, row_cuda, col_cuda, shape, b_cuda, backend='cudss', method='lu')

spsolve_coo¶

torch_sla.spsolve_coo(A: Tensor, b: Tensor, **kwargs) → Tensor[源代码]¶

Solve Ax = b where A is a sparse COO tensor

参数:

A (torch.Tensor) -- Sparse COO tensor representing the matrix
b (torch.Tensor) -- Right-hand side vector
**kwargs -- Additional arguments passed to spsolve()

返回:

Solution vector x

返回类型:

spsolve_csr¶

torch_sla.spsolve_csr(A: Tensor, b: Tensor, **kwargs) → Tensor[源代码]¶

Solve Ax = b where A is a sparse CSR tensor

参数:

A (torch.Tensor) -- Sparse CSR tensor representing the matrix
b (torch.Tensor) -- Right-hand side vector
**kwargs -- Additional arguments passed to spsolve()

返回:

Solution vector x

返回类型:

批量求解函数¶

spsolve_batch_same_layout¶

torch_sla.spsolve_batch_same_layout(val_batch: Tensor, row: Tensor, col: Tensor, shape: Tuple[int, int], b_batch: Tensor, method: Literal['cg', 'bicgstab', 'cusolver_qr', 'cusolver_cholesky', 'cusolver_lu', 'cudss', 'cudss_lu', 'cudss_cholesky', 'cudss_ldlt'] = 'bicgstab', atol: float = 1e-10, maxiter: int = 10000) → Tensor[源代码]¶

Batch solve sparse linear systems with the SAME sparsity pattern.

自 Use 版本弃用: SparseTensor.decompose().solve() instead for a more Pythonic interface:

>>> A = SparseTensor(val, row, col, shape)
>>> decomp = A.decompose(method='superlu')
>>> x_batch = decomp.solve(val_batch, b_batch)

All matrices A_i share the same (row, col) structure but have different values. This is efficient when the sparsity pattern is fixed (e.g., FEM with fixed mesh).

Solves: A_i @ x_i = b_i for i = 0, 1, ..., batch_size-1

参数:

val_batch (torch.Tensor) -- [batch_size, nnz] Non-zero values for each matrix
row (torch.Tensor) -- [nnz] Row indices (shared across batch)
col (torch.Tensor) -- [nnz] Column indices (shared across batch)
shape (Tuple[int, int]) -- (m, n) Shape of each sparse matrix
b_batch (torch.Tensor) -- [batch_size, m] Right-hand side vectors
method (str) -- Solver method (same options as spsolve)
atol (float) -- Absolute tolerance for iterative solvers
maxiter (int) -- Maximum iterations for iterative solvers

返回:

[batch_size, n] Solution vectors

返回类型:

示例

>>> import torch
>>> from torch_sla import spsolve_batch_same_layout
>>>
>>> batch_size = 10
>>> n = 100
>>> nnz = 500
>>>
>>> # Same sparsity pattern, different values
>>> row = torch.randint(0, n, (nnz,))
>>> col = torch.randint(0, n, (nnz,))
>>> val_batch = torch.randn(batch_size, nnz, dtype=torch.float64)
>>> b_batch = torch.randn(batch_size, n, dtype=torch.float64)
>>>
>>> x_batch = spsolve_batch_same_layout(val_batch, row, col, (n, n), b_batch)

spsolve_batch_different_layout¶

torch_sla.spsolve_batch_different_layout(matrices: List[Tuple[Tensor, Tensor, Tensor, Tuple[int, int]]], b_list: List[Tensor], method: Literal['cg', 'bicgstab', 'cusolver_qr', 'cusolver_cholesky', 'cusolver_lu', 'cudss', 'cudss_lu', 'cudss_cholesky', 'cudss_ldlt'] = 'bicgstab', atol: float = 1e-10, maxiter: int = 10000) → List[Tensor][源代码]¶

Batch solve sparse linear systems with DIFFERENT sparsity patterns.

自 Use 版本弃用: SparseTensorList.solve() instead for a more Pythonic interface:

>>> matrices = SparseTensorList([A1, A2, A3])
>>> x_list = matrices.solve([b1, b2, b3])

Each matrix can have a different structure. This is useful when dealing with heterogeneous problems or adaptive mesh refinement.

参数:

matrices (List[Tuple[val, row, col, shape]]) -- List of sparse matrices, each as (values, row_indices, col_indices, shape)
b_list (List[torch.Tensor]) -- List of right-hand side vectors
method (str) -- Solver method (same options as spsolve)
atol (float) -- Absolute tolerance for iterative solvers
maxiter (int) -- Maximum iterations for iterative solvers

返回:

List of solution vectors

返回类型:

List[torch.Tensor]

示例

>>> import torch
>>> from torch_sla import spsolve_batch_different_layout
>>>
>>> # Different matrices with different sizes/patterns
>>> matrices = []
>>> b_list = []
>>> for n in [50, 100, 150]:
...     nnz = n * 5
...     val = torch.randn(nnz, dtype=torch.float64)
...     row = torch.randint(0, n, (nnz,))
...     col = torch.randint(0, n, (nnz,))
...     matrices.append((val, row, col, (n, n)))
...     b_list.append(torch.randn(n, dtype=torch.float64))
>>>
>>> x_list = spsolve_batch_different_layout(matrices, b_list)

非线性求解¶

nonlinear_solve¶

torch_sla.nonlinear_solve(residual_fn: Callable, u0: Tensor, *params, jacobian_fn: Callable | None = None, method: str = 'newton', tol: float = 1e-06, atol: float = 1e-10, max_iter: int = 50, line_search: bool = True, verbose: bool = False, linear_solver: str = 'pytorch', linear_method: str = 'cg') → Tensor[源代码]¶

Solve nonlinear equation F(u, θ) = 0 with adjoint-based gradients.

参数:

residual_fn -- Function F(u, *params) -> residual tensor
u0 -- Initial guess for solution
*params -- Parameters θ (tensors with requires_grad=True for gradient computation)
jacobian_fn -- Optional function J(u, *params) -> (val, row, col, shape) Returns sparse Jacobian in COO format. If None, uses autograd.
method -- Nonlinear solver method - 'newton': Newton-Raphson with optional line search (default) - 'picard': Fixed-point iteration - 'anderson': Anderson acceleration
tol -- Relative convergence tolerance
atol -- Absolute convergence tolerance
max_iter -- Maximum number of nonlinear iterations
line_search -- Use Armijo line search for Newton (default: True)
verbose -- Print convergence information
linear_solver -- Backend for linear solves ('pytorch', 'scipy', 'cudss')
linear_method -- Method for linear solves ('cg', 'bicgstab', 'lu')

返回:

Solution tensor satisfying F(u, θ) ≈ 0

返回类型:

示例

>>> def residual(u, A_val, b):
...     # Nonlinear: A(u) @ u - b where A depends on u
...     return torch.sparse.mm(A, u.unsqueeze(1)).squeeze() - b
...
>>> u0 = torch.zeros(n, requires_grad=False)
>>> A_val = torch.randn(nnz, requires_grad=True)
>>> b = torch.randn(n, requires_grad=True)
>>>
>>> u = nonlinear_solve(residual, u0, A_val, b, method='newton')
>>> loss = some_loss(u)
>>> loss.backward()  # Computes ∂L/∂A_val and ∂L/∂b via adjoint

adjoint_solve¶

torch_sla.adjoint_solve(residual_fn: Callable, u0: Tensor, *params, jacobian_fn: Callable | None = None, method: str = 'newton', tol: float = 1e-06, atol: float = 1e-10, max_iter: int = 50, line_search: bool = True, verbose: bool = False, linear_solver: str = 'pytorch', linear_method: str = 'cg') → Tensor¶

Solve nonlinear equation F(u, θ) = 0 with adjoint-based gradients.

参数:

residual_fn -- Function F(u, *params) -> residual tensor
u0 -- Initial guess for solution
*params -- Parameters θ (tensors with requires_grad=True for gradient computation)
jacobian_fn -- Optional function J(u, *params) -> (val, row, col, shape) Returns sparse Jacobian in COO format. If None, uses autograd.
method -- Nonlinear solver method - 'newton': Newton-Raphson with optional line search (default) - 'picard': Fixed-point iteration - 'anderson': Anderson acceleration
tol -- Relative convergence tolerance
atol -- Absolute convergence tolerance
max_iter -- Maximum number of nonlinear iterations
line_search -- Use Armijo line search for Newton (default: True)
verbose -- Print convergence information
linear_solver -- Backend for linear solves ('pytorch', 'scipy', 'cudss')
linear_method -- Method for linear solves ('cg', 'bicgstab', 'lu')

返回:

Solution tensor satisfying F(u, θ) ≈ 0

返回类型:

示例

>>> def residual(u, A_val, b):
...     # Nonlinear: A(u) @ u - b where A depends on u
...     return torch.sparse.mm(A, u.unsqueeze(1)).squeeze() - b
...
>>> u0 = torch.zeros(n, requires_grad=False)
>>> A_val = torch.randn(nnz, requires_grad=True)
>>> b = torch.randn(n, requires_grad=True)
>>>
>>> u = nonlinear_solve(residual, u0, A_val, b, method='newton')
>>> loss = some_loss(u)
>>> loss.backward()  # Computes ∂L/∂A_val and ∂L/∂b via adjoint

持久化 (I/O)¶

safetensors 格式¶

torch_sla.save_sparse(tensor: SparseTensor, path: str | Path, metadata: Dict[str, str] | None = None) → None[源代码]¶

Save a SparseTensor to safetensors format.

参数:

tensor (SparseTensor) -- The sparse tensor to save.
path (str or Path) -- Output file path (should end with .safetensors).
metadata (dict, optional) -- Additional metadata to store in the file.

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> save_sparse(A, "matrix.safetensors")

torch_sla.load_sparse(path: str | Path, device: str | device = 'cpu') → SparseTensor[源代码]¶

Load a SparseTensor from safetensors format.

参数:

path (str or Path) -- Input file path.
device (str or torch.device) -- Device to load tensors to.

返回:

The loaded sparse tensor.

返回类型:

示例

>>> A = load_sparse("matrix.safetensors", device="cuda")

torch_sla.save_distributed(tensor: SparseTensor, directory: str | Path, num_partitions: int, partition_method: str = 'simple', coords: Tensor | None = None, verbose: bool = False) → None[源代码]¶

Save a SparseTensor as partitioned files for distributed loading.

Creates a directory with: - metadata.json: Global metadata and partition info - partition_0.safetensors, partition_1.safetensors, ...: Per-partition data

参数:

tensor (SparseTensor) -- The global sparse tensor to partition and save.
directory (str or Path) -- Output directory path.
num_partitions (int) -- Number of partitions to create.
partition_method (str) -- Partitioning method: 'simple', 'metis', or 'geometric'.
coords (torch.Tensor, optional) -- Node coordinates for geometric partitioning.
verbose (bool) -- Print progress information.

示例

>>> A = SparseTensor(val, row, col, (1000, 1000))
>>> save_distributed(A, "matrix_dist", num_partitions=4)
# Creates:
#   matrix_dist/metadata.json
#   matrix_dist/partition_0.safetensors
#   matrix_dist/partition_1.safetensors
#   matrix_dist/partition_2.safetensors
#   matrix_dist/partition_3.safetensors

torch_sla.load_partition(directory: str | Path, rank: int, world_size: int | None = None, device: str | device = 'cpu') → DSparseMatrix[源代码]¶

Load a single partition for the given rank.

Each rank loads only its own partition, enabling efficient distributed loading.

参数:

directory (str or Path) -- Directory containing partitioned data.
rank (int) -- Rank of this process.
world_size (int, optional) -- Total number of processes (must match num_partitions). If None, reads from metadata.
device (str or torch.device) -- Device to load tensors to.

返回:

The partition for this rank.

返回类型:

示例

>>> # In distributed context
>>> rank = dist.get_rank()
>>> world_size = dist.get_world_size()
>>> partition = load_partition("matrix_dist", rank, world_size, device="cuda")

Matrix Market 格式¶

torch_sla.save_mtx(tensor: SparseTensor, path: str | Path, comment: str = '', field: str = 'real', symmetry: str = 'general') → None[源代码]¶

Save a SparseTensor to Matrix Market (.mtx) format.

参数:

tensor (SparseTensor) -- The sparse tensor to save.
path (str or Path) -- Output file path (should end with .mtx).
comment (str, optional) -- Comment to include in the header.
field (str, optional) -- Field type: 'real', 'complex', 'integer', or 'pattern'. Default: 'real'.
symmetry (str, optional) -- Symmetry type: 'general', 'symmetric', 'skew-symmetric', or 'hermitian'. Default: 'general'.

示例

>>> A = SparseTensor(val, row, col, (100, 100))
>>> save_mtx(A, "matrix.mtx")
>>> save_mtx(A, "matrix.mtx", symmetry="symmetric")

torch_sla.load_mtx(path: str | Path, dtype: dtype | None = None, device: str | device = 'cpu') → SparseTensor[源代码]¶

Load a SparseTensor from Matrix Market (.mtx) format.

参数:

path (str or Path) -- Input file path.
dtype (torch.dtype, optional) -- Data type for values. If None, inferred from file.
device (str or torch.device) -- Device to load tensors to.

返回:

The loaded sparse tensor.

返回类型: