Induction of Fuzzy Rules and Membership Functions From Training Examples'
Induction of Fuzzy Rules and Membership Functions From Training Examples'
Abstract
Most fuzzy controllers and fuzzy expert systems must predefine membership functions and fuzzy inference rules to map
numeric data into linguistic variable terms and to make fuzzy reasoning work. In this paper, we propose a general
learning method as a framework for automatically deriving membership functions and fuzzy if-then rules from a set of
given training examples to rapidly build a prototype fuzzy expert system. Based on the membership functions and the
fuzzy rules derived, a corresponding fuzzy inference procedure to process inputs is also developed.
Keywords: Expert systems; Fuzzy clustering; Fuzzy decision rules: Fuzzy machine learning: Knowledge acquisition:
Membership functions
016%0114/96/$15.00 Copyright 11“ 1996 Elsevier Science B.V. All rights reserved
SSDl 0165-01 14(95)00305-3
system will become better and better as it develops. built by human experts or experienced users.
Also, the expert system approach can integrate The same problem as before then arises: if the
expertise from many fields, reduce the cost of query, experts are not available, then the membership
lower the probability of danger occurring, and functions cannot be accurately defined, or the
provide fast response [ 191. fuzzy systems developed may not perform
To apply expert systems in decision-making, well.
having the capacity to manage uncertainty and In this paper, we propose a general learning
noise is quite important. Several theories such as method as a framework to derive membership
fuzzy set theory [27,29], probability theory, D-S functions automatically and fuzzy if-then rules
theory [21], and approaches based on certainty from a set of given training examples and quickly
,factors [Z], have been developed to manage uncer- build a prototype of an expert system. Based on
tainty and noise. Fuzzy set theory is more and more the membership functions and the fuzzy rules
frequently used in expert systems and controllers, derived, a corresponding fuzzy inference pro-
because of its simplicity and similarity to human cedure to process the input in question is
reasoning. The theory has been applied to many developed.
fields such as manufacturing, engineering, diag- The remaining parts of this paper are organized
nosis, economics, and others [S, 9,281. However, as follows. In Section 2, the development of an
current fuzzy systems have the following general expert system is introduced. In Section 3, the con-
limitations. cept and terminology of fuzzy sets are briefly re-
(1) They have no common framework from viewed. In Section 4, the basic architecture of fuzzy
which to deal with different kinds of problems; in expert systems is provided. In Section 5, a new
other words, they are problem-dependent. learning method is given for automatically deriving
(2) Human experts play a very important role membership functions and fuzzy if-then rules from
in developing fuzzy expert systems and fuzzy a set of training instances. In Section 6, an inference
controllers. procedure for processing inputs based on the de-
Most fuzzy controllers and fuzzy expert systems rived rules is suggested. In Section 7, application to
can be seen as special rule-based systems that use Fisher’s iris data is presented, Conclusions are
fuzzy logic. A fuzzy rule-based expert system con- given in Section 8.
tains fuzzy rules in its knowledge base and derives
conclusions from the user inputs and the fuzzy
reasoning process [9,28]. A fuzzy controller is 2. Development of an expert system
a knowledge-based control scheme in which
scaling functions of physical variables are used Development of a classical expert system is illus-
to cope with uncertainty in process dynamics or trated in Fig. 1 [19]. A knowledge engineer first
the control environment [7]. They must usually establishes a dialog with a human expert in order to
predefine membership functions and fuzzy infer- elicit the expert’s knowledge. The knowledge engin-
ence rules to map numeric data into linguistic eer then encodes the knowledge for entry into the
variable terms (e.g. very high, young, . ) and knowledge base. The expert then evaluates the ex-
to make fuzzy reasoning work. The linguistic pert system and gives a critique to the knowledge
variables are usually defined as fuzzy sets with engineer. This process continues until the system’s
appropriate membership functions. Recently, many performance is judged to be satisfactory by the
fuzzy systems that automatically derive fuzzy if- expert. The user then supplies facts or other in-
then rules from numeric data have been developed formation to the expert system and receives expert
[3,13,18,22,23]. In these systems, prototypes of advice in response [19].
fuzzy rule bases can then be built quickly without Although a wide variety of expert systems have
the help of human experts, thus avoiding a develop- been built, a development bottleneck occurs in
ment bottleneck. Membership functions still need knowledge acquisition. Building a large-scale ex-
to be predefined, however, and thus are usually pert system involves creating and extending a large
T-P. Hong, C.-Y Lee / FUZZVSets and $stems 84 (1996) 33-47 35
knowledge base over the course of many months or pA:X + (0,l). (2)
years. For instance, the knowledge base of the
This kind of function can be generalized such
XCON (RI) expert system has grown over the past
that the values assigned to the elements of the
10 years from 300 component descriptions and 750
universal set fall within a specified range and are
rules to 31000 component descriptions and 10000
referred to as the membership grades of these
rules [20]. Shortening the development time is
elements in the set. Larger values denote higher
then the most important factor for the success
degrees of set membership. Such a function is called
of an expert system. Recently, machine-learning
a membership function pA by which a fuzzy set A is
techniques have been developed to ease the know-
usually defined. This function can be indicated by
ledge-acquisition bottleneck. Among machine-
learning approaches, deriving inference rules from pa:X -+ [O, 11. (3)
training examples is the most common [12,16,17].
where X refers to the universal set defined in a spe-
Given a set of examples and counterexamples
cific problem, and [0, l] denotes the interval of real
of a concept, the learning program tries to induce
numbers from 0 to 1, inclusively.
general rules that describe all of the positive
Assume that A and B are two fuzzy sets with
training instances and none of the counter-
membership functions of pA and ,~a. The most com-
examples. If the training instances belong to more
monly used primitives for fuzzy union and fuzzy
than two classes, the learning program tries
intersection are as follows [23]:
to induce general rules that describe each class.
In addition to classical machine learning methods,
fuzzy learning methods (such as fuzzy ID3) (4)
[8,24,25] for inducing fuzzy knowledge have
also emerged recently. Machine learning then Although these two operations may cause the
provides a feasible way to build a prototype (fuzzy) problems of partially single operand dependency
expert system. and negative compensation [lo], they are the most
36 T.-P. Hong, C.-Y Lee J FUZZYSets and Systems 84 (1996) 33-47
commonly used because of their simplicity. These Explanation mechanism: A mechanism that ex-
two operators are also used in this paper in de- plains the inference process to users.
riving the fuzzy if-then rules and membership Working memory: A storage facility that saves
functions. user inputs and temporary results.
Knowledge-acquisition facility: An effective knowl-
edge-acquisition tool for conventional interview-
4. Architecture of a fuzzy expert system ing or automatically acquiring the expert’s
knowledge, or an effective machine-learning
Fig. 2 shows the basic architecture of a fuzzy approach to deriving rules and membership
expert system. Individual components are illus- functions automatically from training instances,
trated as follows. or both.
User interface: For communication between Here the membership functions are stored in
users and the fuzzy expert system. The interface a knowledge base (instead of being put in the inter-
should be as friendly as possible. face) since by our method, decision rules and mem-
Membership function base: A mechanism that bership functions are acquired by a learning
presents the membership functions of different method. When users input facts through the user
linguistic terms. interface, the fuzzy inference engine automatically
Fuzzy rule base: A mechanism for storing fuzzy reasons using the fuzzy rules and the membership
rules as expert knowledge. functions, and sends fuzzy or crisp results through
Fuzzy inference engine: A program that executes the user interface to the users as outputs.
the inference cycle of fuzzy matching, fuzzy con- In the next section, we propose a general learning
flict resolution, and fuzzy rule-firing according to method as a knowledge-acquisition facility for
given facts. automatically deriving membership functions and
T.-P. Hong, C.-Y. Lee / Fuzzy Sets and $stems 84 (1996) 33-47 31
where ?Ci*(1 < r < m) is the rth attribute value of the 5.3. Example
ith training example and yi is the output value of
the ith training example. As before, assume an insurance company decides
For example, assume an insurance company de- insurance fees according to age and property. Each
cides insurance fees according to two attributes: age training instance then consists of two attributes: age
and property. If the insurance company evaluates and propertJ9 (in ten thousands), and one output:
and decides the insurance fee for a person of age 20 insurance fee. The goal of the learning process is to
possessing property worth $30000 should be construct a membership function for each attribute
$1000, then the example is represented as (age = 20, (i.e. age and property), and to derive fuzzy decision
property = $30 000, insurance fee = $1000). rules to decide on reasonable insurance fees.
Assume the following eight training examples are
5.2. The algorithm available:
The learning activity is shown in Fig. 3 [23]. Age Property Insurance fee
A set of training instances are collected from the ( 20, 30; 2000 )
environment. Our task here is to generate auto- ( 25, 30; 2100 )
matically reasonable membership functions and ( 30, 10; 2200 )
appropriate decision rules from these training data, ( 45, 50; 2500 )
so that they can represent important features of the ( 50, 30: 2600 )
data set. The proposed learning algorithm can be ( 60, 10; 2700 )
divided into five main steps. ( 80, 30; 3200 )
Step 1: cluster and fuzzify the output data; ( 80, 40; 3300 )
Step 2: construct initial membership functions
for input attributes; The learning algorithm proceeds as follows.
Step 3: construct the initial decision table; Step 1: Cluster and fuzziJy the output data. In
Step 4: simplify the initial decision table; this step, the output values of all training instances
38 T.-P. Hong, C-Y Lee / Fuzzy Sets and Svstems 84 (IW6) 33-47
are appropriately grouped by applying the cluster- where yi d yI+ 1 (for i = 1, . , n - 1).
ing procedure below, and appropriate membership
functions for output values are derived. Our clus- Example 1. For the training instances given in the
tering procedure considers training instances with example, Substep (la) proceeds as follows.
close output values as belonging to the same class Original order:
with high membership values. Six substeps are in- 2000,2100, 2200, 2500,2600, 2700, 3200, 3300
cluded in Step 1. The flow chart is shown in Fig. 4. 1 sorting
Details are as follows.
Modijied order:
Substep (la): Sort the output values of the training
2000,2100, 2200, 2500, 2600,2700, 3200, 3300
instances in an ascending order. It sorts the output
values of the training instances to find the relation- Substep (1 b): Find the difference between adjacent
ship between adjacent data. The modified order data. The difference between adjacent data pro-
after sorting is then vides the information about the similarity between
them. For each pair yi and yi+ I(i = 1,2, . . , n - l),
y;,yi, ‘.. ,Yb. (6) the difference is difi = y:+ 1 - yi.
T.-P. Hong, C.-Y. Lee lIIFuzzy Sets and $vstems 84 (1996) 33-47 39
Diflerence sequence:
100, 100, 300, 100, 100, 500, 100. A,
ua b c
Substep (1~): Find the value of’ similarity bet- Fig. 5. A triangle membership function.
ween adjacent data. In order to obtain the value
of similarity between adjacent data, we convert
each distance difi to a real number si between If si < x then divide the two adjacent data into
0 and 1 according to the following formula [IS]: different groups;
else put them into the same group.
: diffi
Si =
I_
~a
5
ford% d C * as,
(7) After the above operation, we can obtain the
i
10 otherwise, result formed as (y:, Rj), meaning that the ith out-
put data will be clustered into the Rj, where
where Si represents the similarity between yi and
Ri means the jth produced fuzzy region.
yi+ i, di& is the distance between yi and y:+ ,, as is
the standard derivation of difi’s, and C is a control
parameter deciding the shape of the membership
Example 4. Assume a is set at 0.8. The training
functions of similarity. A larger C causes a greater
examples are then grouped as follows:
similarity.
Procedure 1: Find the central-vertex-point hi: similarity. The two boundary instances in a group
if y:, yj+ 1, , y; belong to the jth group, then have larger similarities, causing the member-
then the central-vertex-point hj in this group is ship function formed to be flatter.
defined as
1 si+l + si+2
si + ~ + .’
2 + 2
Procedure 2: Determine the membership of yi Substep (If): Find the membership value of he-
and y;. longing to the desired group jbr each instance.
From the above membership functions we can get
The minimum similarity in the group is chosen the fuzzy value of each output data formed as
as the membership value of the two boundary (yj, Rj, pij), referred to as the ith output data has the
points yi and y;. Restated, the following formulas fuzzy value LCij to the cluster Rj. Each training
are used to calculate pj(yi) and pj(y;), where ~j instance is then transformed as
represents the membership of belonging to the jth
(X13X23 . . . >-~,;(RI,PIL(R~>P~), ... ,(&,Pk)). (12)
group:
pj(Yi) = pj(Y;) = min(si,si+l, . ,Sk-l). (9) Example 6. Each output is transformed as follows:
YL - bj
c=bj+ (11)
1 - P;(Yb)’
FT
FU=Y value
value 1
0 01 a2 a3 a4
an-1 an Input data Fig. 9. Initial membership functions of property.
1
FwaY
value
1
R2 q-
1 2 3 4 5 6 7 8 9 10 11 12 13 Age
0 20 25 30 35 40 45 50 55 60 65 70 75 80 Age
Fig. IO. Initial decision table for insurance problem
to be a triangle (a, b,c) with b - u = c - b = the Step 3: Construct the initial decision table. In
smallest predefined unit. For example. if three this step we build a multi-dimensional decision
values of an attribute are lo,15 and 20, then the table (each dimension represents a correspond-
smallest unit is chosen to be 5. Here we let a0 be the ing attribute) according to the initial membership
smallest value for the attribute and a, be the biggest functions. Let a cell be defined as the contents
value for the attribute. Initial membership func- of a position in the decision table. Cellcd,,d2, ,,,,,)
tions for the attribute are viewed as in Fig. 7, where then represents the contents of the position
Ui-Ui-1 =Ui+l - ai = the smallest predefined (d,,& ... ,di, . . . ,d,) in the decision table, where
unit, and R, means the xth initial region m is the dimension of the decision table and di is the
(i = 2,3, , n - 1;x = 1,2, . . . , n). position value at the ith dimension. Each cell in the
At first sight, this definition seems unsuitable for table may be empty, or may contain a fuzzy region
problems with small units over big ranges (e.g. (with maximum membership value) of the output
1, 1.1, lOOO), since many initial membership func- data. Again, in practical implementation, only the
tions may exist. But in practical implementation, non-null cells are kept (through an appropriate
only the membership functions corresponding to data structure [ 141).
existing attribute values are kept and considered
(through an appropriate data structure [14]). The Example 8. The initial decision table for the insur-
membership functions corresponding to no at- ance fee problem is shown in Fig. 10.
tribute values are not kept here since they will be
merged in later steps. Step 4: Simplify the initial decision table. In this
step, we simplify the initial decision table to elimin-
Example 7. Let 5 be the smallest predefined unit of ate redundant and unnecessary cells. The five merg-
age and property. The initial membership functions ing operations defined here achieve this purpose.
42 T.-P. Hong. C.-Y Lee I Fuzzy Sets and Swtems 84 (1996) 33-47
. ,
PIOWItV Property
9
8
7
6
5 R-1
4
31
2 I
1 IRl/ ~ /R21
1 2 3 4 5 6 7 8 9 10 11 12 13 .%S
1 2 3 4 S 6 7 8 9 10 11 12 13
Fig. 12. The results after Operation I
Fig. 14. The results after Operation 3.
0 25 30 45 5417 60 80 Age
Property
9 Fuzzy
8 value
7
PI
6
5 Rl R2 R3
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12 13 &e
0 10 20 30 40 50 Property
Fig. 17. The results after Operations 3 and 4
Fig. 11. Final membership function for property
Rl R2
. .
. .
. . example, in Fig. 17, the three columns age = 1,
age = 4 and qe = 6 are then merged into two
Rl R2
columns.
Fig. 18. An example of Operation 5 Example 12. The decision table after merge opera-
tion 5 is shown in Fig. 19.
Example 13. The membership functions as rebuilt Details are explained below.
after Step 5 are shown in Figs. 20 and 2 1, respectively. Step 1: Convert numeric input values to linguistic
terms according to the membership functions de-
Step 6: Derive decision rules from the decision rived. We map a numeric input (Ii, 12, ,I,) to
table. In this step, fuzzy if-then rules are derived their corresponding fuzzy linguistic terms with
from the decision table. Each cell cellcd,,d, _,d,J = membership values.
Ri in the decision table is used to derive a rule:
Example 15. Assume a new input datum 1(37,40)
If input, = dl, input2 = d2, . . . , and input,,, = d,
(i.e. age = 37, property = 40) is fed into the fuzzy
Then output = Ri. (13) inference process. It is first converted into the
following fuzzy terms through the membership
Example 14. After Step 6 we obtain the following functions derived:
rules:
PA,(Z) = 0.55,
If age is A, and property is Pi
Then insurance fee is RI * Rule,, /$(I) = 0.3,
(9 <bj
FU=Y
value
SW0
1
nl n’
sepalL0lgth ” Sepal Width
(c) (d) ,
Fuw FU=J
value PLL, PL, value m, PW, PW3
PL2 pwo
1’ \ 1
V
OL 3.25 [Link] 5.7 0 030.5 1.01.3 [Link].1
Petal Length Petal Width
Table 3
The average accuracy of the fuzzy learning algorithm for the Iris 8. Conclusions
Problem
In this paper, we have proposed a general learn-
Setosa Versicolor Virginica Average Number of rules
ing method for automatically deriving membership
100% 94% 92.72% 95.57% 6.21 functions and fuzzy if-then rules from a set of given
training examples. The proposed approach can sig-
nificantly reduce the time and effort needed to
develop a fuzzy expert system. Based on the mem-
was run on the training set to induce fuzzy classi- bership functions and the fuzzy rules derived, a cor-
fication rules and membership functions. The rules responding fuzzy inference procedure to process
and membership functions derived were then tested inputs was also applied. Using the Iris Data, we
on the test set to measure the percentage of found our model gives a rational result, few rules,
correct predictions. In each run, 50% of the and high performance.
Iris Data were selected at random for training,
and the remaining 50% of the data were used for
Acknowledgements
testing.
In the original data order, the derived member-
The authors would like to thank the anonymous
ship functions of the four attributes are shown in
referees for their very constructive comments.
Fig. 22 and the eight derived fuzzy inference rules
are shown in Table 2.
From Fig. 22, it is easily seen that the numbers of References
membership functions for the attributes sepal
length and sepal width are one, showing that these 111 H.R. Berenji, Fuzzy logic controller, in: R.R. Yager and
two attributes are useless in classifying the Iris L.A. Zadeh, Eds.. An Introduction to Fuzzy Logic Applica-
tions in Intelligent Systems (Kluwer Academic Publishers,
Data. Also, the initial membership functions of the
Dordrecht, 1992) 45596.
attribute petal length were finally merged into only
PI B.G. Buchanan and E.H. Shortliffe, Rule-Based Experr
three ranges, and the initial membership functions System: The M YCIN Experiments of‘the Stundford Heuris-
of the attribute petal width were finally merged into tic Proyramminq Projects (Addison-Wesley. Reading, MA,
only four ranges, showing that a small number of 1984).
membership functions are enough for a good rea- 131 D.G. Burkhardt and P.P. Bonissone, Automated fuzzy
knowledge base generation and tuning, IEEE Internut.
soning result. Conf: on Fuzzy Systems (San Diego, 1992) 179-l 88.
Experiments were then made to verify the accu- c41 R. Fisher, The use of multiple measurements in taxonomic
racy of the fuzzy learning algorithm. For each kind problems, Ann. Euyenics 7 (1936) 1799188.
T.-P. Hong. C.-Y Lee IIFuzzy Srrs mui $sterns ii4 (1996) 33~ 47 47
[S] 1. Graham and P.L. Jones, Expert Systems Knowledge, [I71 R.S. Michalski. J.G. Carbonell and T.M. Mitchell.
Uncertainty and Decision (Chapman and Computing, Mochine Leuminq: An Arfijcirrl It~trlli~qrm~c 4pprowh.
Boston. 1988) I 17-l 58. Vol. 2 (Morgan Kaufmann. Los Altos. CA. 1984).
[6] K. Hattori and Y. Tor, Effective algorithms for the nearest [I81 H. Nomura, 1. Hayashi and N. Wakami. A learning
neighbor method in the clustering problem. Ptrtton method of fuzzy inference rule\ by descent method.
Reuqnition 26 (1993) 741-746. /EEE Imcwtrr. Con/: on FI,::!, .S~wcw.s (San Diego. 1992)
[7] S. Isaka and A.V. Sebald. An optimization approach for 203~~210.
fuzzy controllers design. IEEE Trans. Swrms Mtrn C&r- 1191 G. Riley. Expert Sy~~~erns - Prinup/c\ orxl Proqrummifut
ncf. 22 (1992) 1469. 1473. (Pus-Kent. Boston, 1989) I- 59.
[X] 0. Itoh, H. Migita and A. Miyamoto, A method of design [20] J.C. Schimmer, Database consistency via inductive Icarn-
and adjustment of fuzzy control rules based on operation mg, Prrw. 8th Imrrmrt. Wor!,shop cm Mtrchiw Lrurnirlq
know-how. Proc. 3rd IEEE Conf. 011 Ftrxy Swiw~s (Morgan Kaufmann, San Mateo. CA. 1991).
(Orlando, FL, 1994) 492-497. [?I] G. Shafer and R. Logan. Implementing Dcmpster’s rule for
[9] A. Kandel. Fu::!, E,yprrt S~.sf~n.s (CRC Press. Boca Raton. hierarchical evidence. ,4rtiJicirr/ /rltc,//iqrmc 33 (I 987)
FL. 1992) Xm19. 271 ‘9X.
[lo] M.H. Kim. J.H. Lee and Y.J. Lee. Analysis of fuzzy oper- [27] T. Takagi and M. Sugeno, Fuzzy identification of systems
ators for high quality information retrieval. In/hrm. Pro- and its applications to modeling and control. IEEE T,arx.
wssirtg Left. 46 (1993) 25 I 256. S~5twn.s ,Zfa~l C&wet. 15 ( 1985) 1I6 137.
[1 l] G.J. Klir and T.A. Folger. F~c;zy SY~S. L’nc~rrttrinr~. and 1231 L.X. Wang and J.M. Mendel. Generating fuzzy rules by
Infirmrrtion (Prentice Hall. Englewood Cliffs. NJ. 1992) learning from examples, IEEE Trtrrls. Sw~ws ,Morl CJhrr-
4-- 14. I7VI. 22 (1992) 1414 1427.
[12] Y. Kodratofl’ and R.S Michalski, 12lachi~, Lwrniq A,? 1243 R. Weber. Furry-ID3: a class of methods for automatic
lrt$cia/ Infrlliyencc Approwh. Vol. 3 (Morgan Kauf- knowledge acquisition. Proc. 2nd Inrwrut. C‘orlf: ON
mann, San Mateo, CA. 1990). FKZJ~ Loqk rrnd .5’cwro/ h’c>fwrwLs (Iizuka. Japan. 1992)
[I31 C.C. Lee, Fuzzy logic in control system: fuzzy logic con- ‘65 268.
troller Part I and Part II. IEEE Turns. 5’wem.s !!lurl 1251 Y. Yuan and M.J. Shaw. Induction of furry decirion trees.
Cyhwnet. 20 (I 990) 404- 435. l:rr::~, Sers S~strn~ 69 (1995) l’75mI?‘).
[I41 C.Y. Lee. Automatic acquisition offuzzq knowledge, Mas- 1361 L.A. Zadch. Fu-_,, Set\. I,!form. trrul C,,[Link] X (1965)
ter Thesis. (Chung-Hua Polytechnic Institute. Hsinchu. 33%35?.
Taiwan. 199 5). [27] LA Zadeh. Fuzzy logic. IEEE COU~~U~.(1988) X3 93.
[15] E.H. Mamdani. Applications of fuzzy algorithms for control [2X] H.J. Zimmermann. Flcz:l &I Y. Dccisiorl AltrXim/ wul
of simple dynamic plant, II:‘EE Proc. 121 (1974) 1585-l 588. El-put S~~.st~w~s(Kluwer Academic Publishers. Boston,
[16] R.S. Michalski. J.G. Carbonell and T.M. Mitchell, 1987).
.Mochine Learniny: iln Arti/icin/ Iutellicqenw Apprrxrc,h, 1293 H.J. Zimmermann. Fu:~J, Ser Thrwr~~ tmd its .Ippl,crrtiow
Vol. 1 (Morgan Kaufmann. Los Altos, CA. IYX?). (Kluwer Academic Publisher. Boston. I991 1.