0% found this document useful (0 votes)
117 views9 pages

Backpropagation in MLFN Explained

The document describes the backpropagation learning rule for multilayer feedforward neural networks (MLFNs). It discusses initializing weights, presenting sample patterns, calculating outputs and errors, and updating weights. Weights are updated based on the gradient of the error function with respect to the weights, using the delta rule. The sigmoid activation function and its derivative are also described, which are important for implementing backpropagation.

Uploaded by

Zemal Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Topics covered

  • Activation Function,
  • Neural Network Applications,
  • Gradient Calculation,
  • Error Minimization,
  • Neural Network Layers,
  • Performance Metrics,
  • Training Data,
  • Classification,
  • Input Patterns,
  • Neural Network Functions
0% found this document useful (0 votes)
117 views9 pages

Backpropagation in MLFN Explained

The document describes the backpropagation learning rule for multilayer feedforward neural networks (MLFNs). It discusses initializing weights, presenting sample patterns, calculating outputs and errors, and updating weights. Weights are updated based on the gradient of the error function with respect to the weights, using the delta rule. The sigmoid activation function and its derivative are also described, which are important for implementing backpropagation.

Uploaded by

Zemal Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Topics covered

  • Activation Function,
  • Neural Network Applications,
  • Gradient Calculation,
  • Error Minimization,
  • Neural Network Layers,
  • Performance Metrics,
  • Training Data,
  • Classification,
  • Input Patterns,
  • Neural Network Functions

Backpropagation Learning Rule for MLFN

Initialize all weights to small random numbers.


Present all samples patterns (one at a time) and perform the
following steps for each sample pattern.
(i) For each output unit
m k ,..., 2 , 1
compute its output k
o
.
(ii) For each output unit
k
) )( 1 (
k k k k k
o t o o
(iii) For each hidden unit
h


k
k hk h h h
w o o ) 1 (
(iv) pdate each weight
ij
w
ij ij ij
w w w +
where
ij j ij
y w
1
The Mathematics of Backpropagation Rule
Notations:

p
E
is the error function for pattern
p
,

pj
t
is the error function for pattern
p
on node
j
,

pj
o
is the actual output of node
j
,

ij
w
is the weight from node
i
to node
j
.
The Activation:
We require a continuous activation function like sigmoid
function

i
pj ij pj
o w net
pj
i
pi ij j pj j
o o w f net f
,
_


) (
2
The !rror Function:
!et us define the error function
p
E
as


k
pk pk p
o t E
2
) (
2
1
"his means that the error function,
p
E
, is proportional to the
s#uare of the difference between the actual and desired output.
Weight "hanging:
$ince weight changes should be proportional to the gradient of the
error therefore we have
ij
p
p ij ij p
w
E
E w


(%.1)
#eriving Formula of !rror "hange
&' using the chain rule, we can brea( down the gradient
components as
ij
pj
pj
pj
pj
p
ij
p
w
net
net
o
o
E
w
E

(%.2)
)onsider the third factor of (%.2)

,
_

k
pk ik
ij ij
pj
o w
w w
net

k
pk
ij
ik
o
w
w
( true for feedforward net)
*
pj
o
, since

'

j k
j k
w
w
ij
ik
when , +
when , 1
(%.*)
,e now consider the middle factor of (%.2)
pj
j
pj
pj
net
net f
net
o

) (
) (
-
pj j
net f
(%.%)
.ow we turn to the first factor of (%.2) which is a little more
complicated.

,
_

j
j j
pj pj
p
o t
o o
E
2
) (
2
1
) (
pj pj
o t
(%./)
Putting values from (%.*), (%.%), and (%./) into (%.2), we have
) ( ) (
-
pj j pj pj pj
ij
p
net f o t o
w
E

(%.0)
"his is useful for the output units, because the target and output are
both available, but not for the hidden units, because their targets
are not (nown.
$o, if unit
j
is not an output unit, we can write, b' the chain rule
again, that

k pj
pk
pk
p
pj
p
o
net
net
E
o
E

k i
pi ik
pj pk
p
o w
o net
E

k
jk
pk
p
w
net
E
(%.1)
%
since

'

j i
j
o
o
pj
pi
when , +
i when , 1
Putting the values from (%.*), (%.%), and (%.1) into (%.2), we have

k
jk
pk
p
pj j pj
ij
p
w
net
E
net f o
w
E
) (
-
(%.2)
It is useful to define
pj
pj
pj
p
pj
p
pj
net
o
o
E
net
E


(%.3)
In view of (%.3), e#uation (%.0) can be written as
pj pj
ij
p
o
w
E

(%.1+)
where
) ( ) (
-
pj j pj pj pj
net f o t
.
$imilarl', e#uation (%.2) can be written as

k
jk pk pj j pj
ij
p
w net f o
w
E
) (
-
pj pj
o
(%.11)
where

k
jk pk pj j pj
w net f ) (
-
.
Putting
pj pj ij p
o w E 4
into (%.1), we have
pj pj ij p
o w
(%.12)
where
/

'

k
jk pk pj j
pj j pj pj
pj
j w net f
j net f o t
unit output an not is if , ) (
unit output an is if ), ( ) (
-
-

(%.1*)
The $igmoid Function and its #erivative
The sigmoid function is used advantageousl% as the
activation function &threshold function'
(t is defined as
net c
e
net f

1
1
) (

+

and has the range


1 ) ( + < < net f
)
k
is a positive constant
that controls the spread of the function * large values of
k

squash the function until as
k

Advantages:
&i' (t is quit like the step function) and so demonstrates
+ehavior of similar nature
&ii' (t acts as an automatic control) since for small input
signals the slope is quite steep and so the function is
changing quite rapidl%) producing a large gain For large
inputs) the slope and thus the gain is much small This
0
means that the net,ork can accept large inputs and still
remain sensitive to small changes
&iii' A ma-or reason for its use is that it has a simple
derivative and this makes the implementation of the
+ack*propagation s%stem much easier
The #erivative
.iven that the output of
jth
unit)
pj
o
is given +%
) 1 4( 1 ) (
net c
pj
e net f o

+
The derivative ,ith respect to the unit) ) (
-
net f ) is given +%
2 -
) 1 4( ) (
net c net c
e e c net f

+
)) ( 1 )( ( net f net f c
) 1 (
pj pj
o o c &/0/'
The derivative is therefore a simple function of the outputs
1utting the value from &/0/' in &/02) ,e have
1

'

k
jk pk pj pj
pj pj pj pj
pj
j w o o c
j o t o o c
unit output an not is if , ) 1 (
unit output an is if ), )( 1 (


The Algorithm for MLFN
&i' (nitiali3e ,eights and thresholds $et all ,eights and
thresholds to small random values
&ii' 1resent input and desired output
1resent input
) ,..., , , (
1 2 1 +

n p
x x x x X
and target
output
) ,..., , , (
1 2 1 +

m p
t t t t T
,here
n
is the
num+er of input neurons and
m
is the num+er of
output neurons For pattern association) p
X
and
p
T
represent the patterns to +e associated For
classification)
p
T
is set to 3ero e4cept for one element
set to 0 that corresponds to the class that
p
X
is in
&iii' "alculate actual output !ach La%er calculates
1
]
1

1
+
n
i
i i pj
x w f y
and passes that as input to the ne4t la%er The final
la%er outputs values
pj
o

&iv' Adapt ,eights $tart from the output la%er) and ,ork
+ack,ards
pj pj ij ij
o t w t w + + ) ( ) 1 (
2
) (t w
ij
represents the ,eights from node
i
to node
j
at time t )

is a gain term) and


pj

is an error
term for
p
neuron
j

For output neurons
) )( 1 (
pj pj pj pj pj
o t o o c
For output neurons


k
jk pk pj pj pj
w o o c ) 1 (
3

You might also like