Neural Operators and Operator Networks vs Parametric Approach: A General Comparison

Neural Operators and Operator Networks vs Parametric Approach: A General Comparison

We explore the use of different neural operator architectures for solving partial differential equations (PDEs). Specifically, it investigates the performance of the Feed-Forward Network (FFN), Deep Operator Network (DeepONet), Fourier Neural Operator (FNO), Convolutional Neural Operator (CNO), and Koopman Neural Operator (KNO) in solving the heat, wave, and Poisson equations. The models are trained and evaluated based on their ability to accurately predict the solutions of these PDEs under varying parameters

Introduction

Partial differential equations (PDEs) play a crucial role in describing various physical phenomena and are widely used in many scientific and engineering domains. Solving PDEs accurately is essential for understanding and predicting the behavior of complex systems. Traditional numerical methods for solving PDEs often rely on discretization techniques and iterative algorithms, which can be computationally expensive and time-consuming. In recent years, there has been a growing interest in leveraging neural networks and deep learning techniques to solve PDEs more efficiently.

Neural operators have emerged as a promising approach for solving PDEs using neural networks. These architectures aim to learn the underlying equations directly from data, enabling faster and more accurate predictions of PDE solutions. In this paper, we compare and evaluate several neural operator architectures, namely the Feed-Forward Network (FFN), Deep Operator Network (DeepONet), Fourier Neural Operator (FNO), Convolutional Neural Operator (CNO), and Koopman Neural Operator (KNO), in the context of solving the heat, wave, and Poisson equation.

The heat equation describes the diffusion of heat in a medium, while the wave equation models the propagation of waves. On the other hand, the Poisson equation represents a static equation that does not depend on time. These equations have different characteristics and require different approaches for accurate solution approximation. By examining the performance of various neural operator architectures on these equations under varying parameters, we can gain insights into their strengths and weaknesses. Our code can be publicly accessed byhttps://github.com/walkerchi/DeepLearning-in-ScientificComputing-PartB

Equations

Heat Equation

The heat equation is a fundamental partial differential equation (PDE) that describes the behavior of heat distribution over time. It can be expressed as follows:

u(t,x)t=Δu\frac{\partial u (t, \vec x)}{\partial t} = \Delta u

Here, x\vec x represents the position vector, which lies within the domain Dheat,x2=[1,1]2\mathcal D^{2}_{\text{heat},\vec x} = [-1,1]^2, indicating that both elements x1x_1 and x2x_2 reside in the interval [1,1]2[-1,1]^2. The variable tt spans the domain Dheat,t=[0,T]\mathcal D_{\text{heat}, t}= [0,T], where TT is the final time.

initial condition

u(0,x1,x2,μ)=1dm=1dμmsin(πmx1)sin(πmx2)/mu(0,x_1,x_2,\mu) = -\frac{1}{d}\sum_{m=1}^d \mu_m sin(\pi m x_1)sin(\pi m x_2)/\sqrt m

boundary condition

u(t,±1,±1)=0u(t, \pm 1, \pm 1) = 0

solution

u(t,x1,x2,μ)=1dm=1dμmme2m2π2tsin(πmx1)sin(πmx2)u(t,x_1,x_2,\mu) = -\frac{1}{d}\sum_{m=1}^d \frac{\mu_m}{\sqrt{m}} e^{-2m^2\pi^2t} sin(\pi m x_1)sin(\pi mx_2)

Heat Equation d=4

Wave Equation:

The wave equation is another important PDE that describes the propagation of waves, such as sound waves or electromagnetic waves. It can be represented as follows:

2u(t,x)t2c2Δxu=uttc2(uxx+uyy)\frac{\partial^2 u (t, \vec x)}{\partial t^2} - c^2 \Delta_{\vec x} u = u_{tt} - c^2(u_{xx} + u_{yy})

In this equation, x\vec x is the position vector, confined within the domain Dwave,x2=[0,1]2\mathcal D^{2}_{\text{wave},\vec x} = [0,1]^2, indicating that the elements x1x_1 and x2x_2 lie within the interval [0,1]2[0,1]^2. The variable tt is restricted to the domain Dwave,t=[0,T]\mathcal D_{\text{wave}, t}= [0,T]. The symbol cc represents the propagation speed.

Initial Condition

u(0,x,y,a)=πK2i,j=1Kaij(i2+j2)rsin(πix)sin(πjy)x,y[0,1]u(0, x, y, a) = \frac{\pi}{K^2} \sum_{i,j=1}^{K} a_{ij} \cdot (i^2 + j^2)^{-r} sin(\pi ix) sin(\pi jy) \quad \forall x,y \in [0, 1]

boundary condition

u(t,±1,±1)=0u(t, \pm 1, \pm 1) = 0

Solution

u(t,x,y,a)=πK2i,j=1Kaij(i2+j2)rsin(πix)sin(πjy)cos(cπti2+j2),x,y[0,1]u(t, x, y, a) = \frac{\pi}{K^2} \sum_{i,j=1}^{K} a_{ij} \cdot (i^2 + j^2)^{-r} sin(\pi ix) sin(\pi jy) cos(c\pi t \sqrt{i^2 + j^2}), \forall x,y \in [0, 1]

Wave Equation K=4

Poisson Equation

The Poisson equation is a static PDE that describes the distribution of scalar fields in various physical phenomena, such as electrostatics or fluid flow. It can be defined as follows:

Δxu(x)=f-\Delta_{\vec x} u (\vec x) = f

Unlike the heat equation and wave equation, the Poisson equation does not depend on time. ff represents the source function.

source function

f=πK2i,j=1Kaij(i2+j2)rsin(πix)sin(πjy),(x,y)Df=\frac{\pi}{K^2} \sum_{i,j=1}^{K} a_{ij} \cdot (i^2 + j^2)^{r} sin(\pi ix) sin(\pi jy),\quad \forall (x,y) \in D

Boundary Condition

uD=0u\vert_{\partial D} = 0

Solution

u(x,y)=1πK2i,j=1Kaij(i2+j2)r1sin(πix)sin(πjy),(x,y)Du(x, y) = \frac{1}{\pi\cdot K^2} \sum_{i,j=1}^{K} a_{ij} \cdot (i^2 + j^2)^{r-1} sin(\pi ix) sin(\pi jy),\quad \forall (x,y) \in D

Poisson Equation K=4

Methodology

In the following illustration, we sample S\mathcal S
times for different μ/a\vec \mu/ \textbf a. Therefore, for each iith sample, μiDheat,μd\vec \mu^i\in \mathcal D^{d}_{\text{heat},\vec \mu} and ajDwave,aK×K\textbf a^j \in \mathcal D^{K\times K}_{\text{wave},\textbf a}

Parameter Approach:Feed Forward Network(FFN):

The objective of the parameter methodology is to establish a transformative function gg, which is capable of constructing a bridge between the position vector x\vec x and parameter μ/a\vec \mu/\textbf a for the equation to the resultant value uu of equation.

g(x,{μ,a})=u(T,x,{μ,a}) g(\vec x,\{\vec \mu, \textbf a \}) = u(T, \vec x, \{\vec \mu,\textbf a \})

With the aid of the universal approximation theorem of Multilayer Perceptron (MLP), we have chosen to implement a powerful MLP to closely approximate the function gg.

DeepONet

In contrast to the parameter approach, DeepONet adopts a different strategy. It learns a mapping function GG, which gives rise to a continuous function that maps the position vector x\vec x to the value uu of the equation, as illustrated in equation:

G(u0)(x)=u(T,x)G(u_0)(\vec x) = u(T,\vec x)

The u0u_0 is a short notation of u(0,x)u(0, \vec x).
To execute this, DeepONet employs a matrix to approximate the continuous function. It mainly consists of two networks: the branch network B\mathcal B and the trunk network T\mathcal T.
In DNO, the jjth sampled position yj\vec y^j, pertaining to different μj\mu^j or aj\textbf a^j, is preserved for further inference within the DeepONet framework. B\mathcal B and T\mathcal T take the form of a Multilayer Perceptron (MLP).

u~(T,x,{μj,aj})=B(u(0,yj,{μj,aj}))T(T,x)\tilde u( T, \vec x, \{\vec\mu^j,\textbf{a}^j \}) = \mathcal B \left( u ( 0, \vec y^j, \{\vec \mu^j,\textbf{a}^j \}) \right) \cdot \mathcal T ( T, \vec x )^\top

Fourier Neural Operator

In contrast to both the parameter approach and the DeepONet approach, the Fourier Neural Operator (FNO)demands the input points to conform to a mesh structure. We refer to this type of neural operator as the Mesh Neural Operator. It takes the form as shown in equation

G(u0,x)=u(T,x) G(u_0,\vec x) = u(T,\vec x)

When it comes to FNO, it adopts the form as illustrated in equation

Hl+1=σ(Conv1×1l(Hl)+F1(Wl(FHl)+bl)) H^{l+1} = \sigma\left(Conv^l_{1\times1}(H^l) + \mathcal F^{-1}\left(W^l(\mathcal F H^l)+b^l\right)\right)

In this equation, HlH^l represents the hidden layer at the llth level. Conv1×1lConv^l_{1\times 1} refers to the convolution operation with a kernel size of 1×11\times 1 at llth layer. The symbols F\mathcal F and F1\mathcal F^{-1} denote the Fourier transform and its inverse, respectively. WlW^l and blb^l stand for the weights and bias of the FNO network at llth layer, respectively. Finally, σ\sigma signifies the activation function.

Convolutional Neural Operator

Inspired by the architectural principles of UNet and the activation mechanisms of StyleGAN3, the Convolutional Neural Operator (CNO) integrates the structural design of UNet with the Leaky Rectified Linear Unit (LReLU) filter from StyleGAN3, utilized as an activation layer.

The activation layer, as depicted as Figure

CNO acitvation layer

operates through a sequence of transformations on the feature map. Initially, the feature map undergoes an upsampling process, followed by the application of a Finite Impulse Response (FIR) filter. Subsequently, the LeakyReLU activation function is applied, introducing non-linearity to the feature map. A second FIR filter is then applied, followed by a downsampling operation.

Koopman Neural Operator

In order to encapsulate complex long-term dynamics, the Koopman Neural Operator (KNO) has been proposed. This operator is designed to learn the Koopman operator, an infinite-dimensional linear operator that governs all observations of a dynamic system. The KNO is applied to the evolution mapping of the dynamic system’s solution, thereby capturing the system’s behavior over time.

Contrasting with the Fourier Neural Operator, the KNO employs parameter sharing across layers for the Spectral Convolution, as shown in Equation.

Hl+1=F1(W(FHl)+b)H^{l+1} = F^{-1}\left(W(\mathcal F H^l)+b\right)

In this equation, Hl+1H^{l+1} represents the output of the spectral convolution layer, F1\mathcal F^{-1} is the inverse Fourier transformation, WW and bb are the weight and bias parameters respectively, and FHl\mathcal F H^l is the Fourier transformation of the input to the layer.

Experiment

Setup

  • CPU/GPU: We use the CPU/GPU of Intel i5-11300H/Nvidia MX450 for these experiments.

  • Equations:
    The parameter d/Kd/K for all equations was systematically varied across the set [1,2,4,8,16][1, 2, 4, 8, 16]. Therefore, we will train 3×5×6=903\times 5\times 6=90 weights for the experiments. In the heat equation, where the attenuation of high-frequency signals over time is observed, the parameter TT was assigned a value of 0.0050.005. Conversely, in the wave equation, the propagation speed cc was determined as 0.10.1, the radial decay parameter rr was set to 0.850.85, and TT was specified as 55. Notably, in the case of the Poisson equation, which exhibits time-independence, the radial decay parameter rr was consistently assigned a value of 0.850.85

  • Sampling: The Mesh sampler is employed for all models. The samplers generated 40964096 mesh spatial points (64×6464\times 64) for the training dataset, validation dataset, with 6464 samplings.

  • Model: For the FFN, a Multi-Layer Perceptron (MLP) with 4 layers was used, each layer having a hidden size of 64 and employing a ReLU activation function. For the DeepONet, both the branch and trunk networks were configured with the same settings as the FFN. The Fourier Neural Operator (FNO) and Koopman Neural Operator (KNO) also followed the same settings as the FFN. To maintain the complexity among models, for the Convolutional Neural Operator (CNO) and UNet, the depth was set to 3 with hidden channel range within [8,32,128][8, 32, 128] and the residual block length was set to 2.

  • Training: The Adam optimizer was used for training, with a learning rate of 0.001 and 1000 epochs. Model validation was performed every 100 epochs using the validation dataset, and the best validation model was stored.

  • Acceleration: To expedite the activation process on the GPU, we utilized the implementation from StyleGAN3, which is implemented in CUDA.

Prediction

In this experiment, we set the parameter dd to a fixed value of 44 for the heat equation. A single sample of u0u_0 and uTu_T was generated. We compare the predictions of each model with the ground truth uTu_T, as depicted in Figure Heat Prediction. The error is quantified using the L2L_2 error metric. Notably, the Fourier Neural Operator and Koopman Neural Operator models exhibit sensitivity to the presence of high-frequency signals. The FFN and DeepONet models struggle to effectively capture the high-frequency components of the signal. Conversely, the Convolutional Neural Operator and Unet models demonstrate good fitting capability, accurately representing the signal.

Poisson Prediction
Heat Prediction
Wave Prediction

Varying Parameter

In this experiment, we generated 64 samples for each equation parameter d/Kd/K within the range [1,2,4,8,16][1, 2, 4, 8, 16]. It is important to note that separate models were trained for each parameter of the equation.
The results of the experiment are illustrated in Figure Heat Varying. The shaded bands in the figure represent the 2σ2\sigma uncertainty, while the curve is fitted using logistic regression. Analyzing the figure, it becomes evident that the Koopman neural operator consistently achieved the lowest error across all variations of the parameter d/Kd/K.
However, it is important to note that there is no discernible ascending or descending pattern observed for certain models as the parameter d/Kd/K varies. This behavior indicates that the models respond differently to different equations, and the relationship between the parameter and the model’s performance is not strictly linear.

Poisson Varying
Heat Varying
Wave Varying

Conclustion

In conclusion, neural operator architectures show great promise in solving PDEs efficiently and accurately. This study contributes to the understanding of different neural operator architectures and their performance in solving the heat, wave, and Poisson equations. Further research can explore the application of neural operators to more complex and diverse PDEs, as well as the development of hybrid approaches that combine the strengths of different architectures.


Neural Operators and Operator Networks vs Parametric Approach: A General Comparison
https://walkerchi.github.io/2023/07/14/ETHz/ETHz-DLSC/
Author
walkerchi
Posted on
July 14, 2023
Licensed under