Universal approximation theorem

In the mathematical theory of artificial neural networks, the universal approximation theorem states^[1] that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rⁿ, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.

One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions.^[2]

Kurt Hornik showed in 1991^[3] that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case.

Formal statement[edit]

The theorem^[2]^[3]^[4]^[5] in mathematical terms:

Let $\varphi :\mathbb {R} \to \mathbb {R}$ be a nonconstant, bounded, and continuous function. Let $I_{m}$ denote the m-dimensional unit hypercube $[0,1]^{m}$ . The space of real-valued continuous functions on $I_{m}$ is denoted by $C(I_{m})$ . Then, given any $\varepsilon >0$ and any function $f\in C(I_{m})$ , there exist an integer $N$ , real constants $v_{i},b_{i}\in \mathbb {R}$ and real vectors $w_{i}\in \mathbb {R} ^{m}$ for $i=1,\ldots ,N$ , such that we may define:

$F(x)=\sum _{i=1}^{N}v_{i}\varphi \left(w_{i}^{T}x+b_{i}\right)$

as an approximate realization of the function $f$ ; that is,

$|F(x)-f(x)|<\varepsilon$

for all $x\in I_{m}$ . In other words, functions of the form $F(x)$ are dense in $C(I_{m})$ .

This still holds when replacing $I_{m}$ with any compact subset of $\mathbb {R} ^{m}$ .

References[edit]

^ Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
^ ^a ^b Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274
^ ^a ^b Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-T
^ Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.
^ Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48

External links[edit]

http://neuralnetworksanddeeplearning.com/chap4.html

This applied mathematics-related article is a stub. You can help Wikipedia by expanding it.

[1] Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary

[cyb-2] Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274

[horn-3] Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-T

[4] Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.

[5] Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48

[1]

[2]

[3]

[4]

[5]

Universal approximation theorem

Contents

Formal statement[edit]

See also[edit]

References[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages