How can a trained model help scientists discover mathematical formulas that govern their experimental data? Kolmogorov-Arnold Networks (KAN) offer scientists a way to communicate with accurate models capable of expressing themselves through mathematical functions.
Crédit for Figure: Liu, Z., et al., (2024). Kan: Kolmogorov-arnold networks.
The current foundation of deep learning models is based on a classic, simple yet incredibly handy neural network architecture for approximating non-linear functions. We know it! We love it! It’s multi-layer perceptrons (MLP). Its core idea is rooted in the universal approximation theorem and it fairly grasps the intricate compositional structures within multi-dimensional data. However, the real strength of MLP lies in its interconnected network; and it struggles with accuracy and interpretability when estimating univariate mathematical functions (e.g. trigonometric functions) at the neuron level. This challenge is mainly due to its use of linear weight matrices (learnable parameters on the edges/synapses) and fixed non-linear activation functions (unlearnable functions at the nodes/neurons) to approximate non-linear functions.
KAN tackles this limitation by assigning learnable activation functions (parameterized by 1D B-Spline curves) to the network's edges. Simply put, the proposed generalized KAN architecture in the paper features a matrix of learnable non-linear activation functions (instead of MLP linear weight matrix). While building a network model based on the Kolmogorov-Arnold representation theorem is not a new concept, this paper makes a considerable contribution by generalizing the network architecture with increased depth and width through stacked KAN layers. This boosts the KAN’s ability to approximate any function with smooth splines.
KAN revolves around two key points: accuracy and interpretability which are examined through numerous trivial and non-trivial examples. The paper illustrates that KAN outperforms MLP in accuracy across different tasks, including regression and solving partial differential equations while maintaining lower complexity. Also, the paper presents many examples from mathematics and physics to highlight its interpretability and interactive capabilities. Through these examples which are accessible in their Github repository, you can easily grasp the mathematical relationships between inputs and outputs by simply looking at the learned activation functions. The paper is very well written! I enjoyed reading it!
Now the question is what is KAN’s potential applications in biology-related tasks. How might its interpretability transform the collaboration between computational biologists and AI and help them to introduce their inductive biases and biological knowledge into models?
One of the examples in their tutorial is protein sequence classification. It's definitely worth diving deeper into this architecture! But I'm not quite sure how we can tweak protein-related problems to make the most of KAN's interpretability. For example, what are the interpretable mathematical formulas between amino acids as inputs (their embeddings) and the stability of the protein as output? Could KAN possibly estimate the binding affinity in protein interactions in a way that allows scientists to understand which amino acid or atom engaged in the interaction contributes the most to the formation of protein complex structure and through what mathematical formula?
Examples of KAN:
KAN paper: