November 22, 2024
New architecture makes neural networks easier to understand

New architecture makes neural networks easier to understand

Tegmark was aware of Poggio’s paper and thought the attempt would lead to another dead end. But Liu was not deterred, and Tegmark soon came around. They saw that even if the simple functions generated by the theorem were not smooth, the network could still approximate them with smooth functions. They further understood that most functions encountered in science are smooth, which made perfect (rather than approximate) representations potentially feasible. Liu was not about to give up on the idea without trying it first, knowing that software and hardware had improved tremendously since Poggio’s paper came out 35 years ago. Many things will be possible in 2024, computationally speaking, that were unthinkable in 1989.

Liu worked on the idea for about a week, during which time he developed a number of prototype CAN systems, all with two layers — the simplest possible networks and the type that researchers had been focusing on for decades. Two-layer CANs seemed the obvious choice, because the Kolmogorov-Arnold theorem essentially provides a blueprint for such a structure. Specifically, the theorem splits the multivariate function into separate sets of inner functions and outer functions. (These represent the activation functions along the edges that stand in for weights in MLPs.) That arrangement lends itself naturally to a CAN structure with an inner and outer layer of neurons — a common arrangement for simple neural networks.

But to Liu’s dismay, none of his prototypes performed well on the scientific tasks he had in mind. Tegmark then made an important suggestion: why not try a KAN with more than two layers, which might be able to handle more advanced tasks?

That outside-the-box idea was the breakthrough they needed. Liu’s early networks were showing promise, so the duo quickly reached out to colleagues at MIT, the California Institute of Technology, and Northeastern University. They wanted mathematicians on their team, plus experts in the areas they wanted their KAN to analyze.

In their April paper, the group showed that three-layer CANs were indeed possible, providing an example of a three-layer CAN that could represent a function exactly (whereas a two-layer CAN could not). And they didn’t stop there. The group has since experimented with up to six layers, each of which allows the network to be aligned with a more complicated output function. “We found that we could essentially stack as many layers as we wanted,” said Yixuan Wang, one of the coauthors.

Proven improvements

The authors also turned their networks on two real-world problems. The first involves a branch of mathematics called knot theory. In 2021, a team at DeepMind announced that they had built an MLP that could predict a particular topological property for a given knot after learning enough of the knot’s other properties. Three years later, the new KAN duplicated that feat. It then went further, showing how the predicted property related to all the others — something, Liu said, that “MLPs can’t do at all.”

The second problem concerns a phenomenon in condensed matter physics called Anderson localization. The goal was to predict the boundary at which a particular phase transition will occur, and then determine the mathematical formula that describes that process. No MLP has ever succeeded in this. Theirs CAN.

But the biggest advantage that CANs have over other types of neural networks, and the main motivation behind their recent development, according to Tegmark, lies in their interpretability. In both examples, the CAN didn’t just spit out an answer; it gave an explanation. “What does it mean for something to be interpretable?” he asked. “If you give me some data, I’ll give you a formula that you can write on a T-shirt.”

The ability of CANs to do this, however limited it has been so far, suggests that these networks could theoretically teach us something new about the world, said Brice Ménard, a physicist at Johns Hopkins who studies machine learning. “If the problem is actually described by a simple equation, the CAN network is pretty good at finding it,” he said. But he cautioned that the domain in which CANs work best is likely to be limited to problems — such as those in physics — where the equations tend to have very few variables.

Liu and Tegmark agree, but do not see it as a disadvantage. “Almost all famous scientific formulas” — such as I = mc2 — “can be written in terms of functions of one or two variables,” Tegmark said. “The vast majority of computations we do depend on one or two variables. KANs exploit that fact and look for solutions of that form.”

The ultimate equations

Liu and Tegmark’s KAN paper quickly made waves, garnering 75 citations in about three months. Soon, other groups were working on their own KANs. A paper by Yizheng Wang of Tsinghua University and others that appeared online in June showed that their Kolmogorov-Arnold-informed neural network (KINN) “significantly outperforms” MLPs for solving partial differential equations (PDEs). That’s no small feat, Wang said: “PDEs are everywhere in science.”

A July paper from researchers at the National University of Singapore was more mixed. They found that KANs outperformed MLPs on tasks related to interpretability, but found that MLPs fared better with computer vision and audio processing. The two networks were roughly equal in natural language processing and other machine learning tasks. To Liu, those results weren’t surprising, since the original KAN group’s focus has always been on “scientific tasks,” where interpretability is the top priority.

Meanwhile, Liu is working to make KANs more practical and user-friendly. In August, he and his collaborators published a new paper called “KAN 2.0,” which he described as “more of a user manual than a conventional paper.” This version is easier to use, Liu said, and offers a multiplication tool, among other features that were missing from the original model.

This type of network, he and his co-authors argue, represents more than just a means to an end. CANs promote what the group calls “curiosity-driven science,” which complements the “application-driven science” that has long dominated machine learning. When observing the motion of celestial bodies, for example, application-driven researchers focus on predicting their future states, while curiosity-driven researchers hope to uncover the physics behind the motion. Through CANs, Liu hopes that researchers can get more out of neural networks than just help with an otherwise daunting computational problem. They could instead focus on simply gaining understanding for its own sake.

Leave a Reply

Your email address will not be published. Required fields are marked *