Monday, September 24, 2018
Sunday, September 23, 2018
Saturday, September 22, 2018
Tuesday, September 18, 2018
Benefits and Limitations of Expert Systems
Benefits and Limitations of Expert Systems:
Some problems like scheduling manufacturing can not be adequately dealt with using mathematical algorithms. But such problems can be dealt with an intelligent way using E.S. E.S. allow experts to be expert. As e.s. are built by DEs, experts will try to focus on the more difficult problems of their speciality.
This, in turn, will result in solutions to new problems and the range of problems which they can solve will widen. Although expert systems lack the robust and general intelligence of human beings they can provide benefits to organisations if their limitations are well understood; only certain classes of problems can be solved using expert system.
Many expert systems require large lengthy and expensive development efforts. Moreover, expert systems lack the breadth of knowledge and the understanding of fundamental principles of a human expert. Their knowledge bases are quite narrow, shallow and brittle.
In fast moving fields such as medicine or computer science, keeping the knowledge base upto date is a critical problem. Expert systems can only represent limited forms of IF-THEN type of knowledge. Such knowledge exists primarily in textbooks. There are no adequate representations for deep causal models or temporal trends. Expert systems cannot yet replicate knowledge which is intuitive, based on analogy or on common sense.
Contrary to early promises, expert system do best in automating lower-end clerical functions. They can provide electronic check lists for lower-level employees in service bureaucracies such as banking, insurance, sales and welfare agencies.
The applicability of expert system to managerial problems generally involve drawing facts and interpretations from divergent sources, evaluating facts and comparing one interpretation of the facts with another and do not involve analysis or simple classification. They typically perform very limited tasks which can be performed by professional, in a few minutes and hours. Hiring or training more experts may be less expensive than building an expert system.
Sunday, September 9, 2018
Thursday, September 6, 2018
Introduction to Reasoning with Uncertain Knowledge:
Introduction to Reasoning with Uncertain Knowledge:
Reasoning probably brings to mind logic puzzles, but it is something which we do every day of our lives. Reasoning in AI is the process by which we use the knowledge we have to draw conclusions or infer something new about a domain of interest. It is necessary part of what we call “intelligence”. Without the ability to reason we are doing little more than a lookup when we use information.
In fact, this is the difference between a standard data base system and a knowledge base. Both have information which can be accessed in various ways but the data base, unlike the knowledge base in the expert system, has no reasoning facilities and can therefore answer only limited specific questions.
What are the types of reasoning we come across? How do we know what to expect when we go on a train journey? What do we think when our friend is annoyed with us? How do we know what will happen if our car has a flat battery? Whether we are aware of it or not, we will use a number of different methods of reasoning, depending on the problem we are considering and the information that we have before us.
The three everyday situations mentioned above illustrate three key types of reasoning which we use. In the first case we know what to expect on a train journey because of our experience of numerous other train journeys. We infer that the new journey will share common features with the examples.
We are aware of the first example called induction, which can be summarised as generalisation from cases seen to infer information about cases unseen. We use it frequently in learning about the world around us. For example, every crow we see is black; therefore we infer that all crows are black. If we think about it, such reasoning is unreliable, we can never prove our inferences to be true, we can only prove them to be false. Take the crow again.
To prove that all crows are black we would have to confirm that all crows which exist, have existed or will exist are black. This is obviously not possible. However, to disprove the statement, all we need is to produce a single crow which is white or pink.
So at best we can amass evidence to support our belief that all crows are black. In spite of its unreliability inductive reasoning is very useful and is the basis of much of our learning. It is used particularly in machine learning.
The second example we considered was working out why a friend is annoyed with us, in other words trying to find an explanation for our friend’s behaviour. It may be that this particular friend is a stickler for punctuality and we are a few minutes late to our rendezvous. We may therefore infer that our friend’s anger is caused by we being late.
This is abduction, the process of reasoning back from something to the state or event which caused it. Of course this too is unreliable; it may be possible that our friend is angry for some other reason (perhaps we had promised to telephone him before coming to him but had avoided). Abduction can be used in cases where the knowledge is incomplete, Abduction can provide a “best guess” given, the available evidence.
The third problem is usually solved by deduction: we have knowledge about cars such as “if the battery is flat the headlights won’t work”; we know the battery is flat so we can infer that the lights won’t work. This is the reasoning of standard logic.
Indeed, we would express our car problem in terms of logic given that:
a = the battery is flat and b = the lights won’t work and the axioms
á—„x: a(x) → b(x)
a (my car).
We can deduce b the light of my car won’t work (my car).
However, we cannot deduce the inverse: that is if we know b, we cannot deduce a; the battery of my car is flat if the lights of my car won’t work. This is not permitted in standard logic. If lights don’t work we may use abduction to derive this explanation. However, it could be wrong; there may be another explanation for the light failure (for example, a bulb may have blown or the battery connections may be loose).
Deduction is probably the most familiar form of explicit reasoning. It can be defined as the process of driving the logically necessary conclusion from the initial premises.
For example:
Elephants are bigger than dogs
Dogs are bigger than mice
Therefore
Elephants are bigger than mice
However, it should be noted that deduction is concerned with logical validity, not actual truth.
Consider the following example; given the facts, can we reach the conclusion by deduction?
Some dogs are greyhounds
The greyhounds run fast
Therefore
Some dogs run fast.
The answer is no. We cannot make this deduction because we do not know that all greyhounds are dogs. The fast dogs may therefore be the greyhounds which are not dogs. This of course is non-sensual in terms of what we know (or more accurately have induced) about the real world, but it is perfectly valid based on the premises given. We should therefore be cautious: deduction (also called analogical inference) does not always correspond to natural human (common sense) reasoning.
Fuzzy Reasoning
Probabilistic reasoning and reasoning with certainty factors deal with uncertainty using principles from probability to extend the scope of standard logics. An alternative approach is to change the properties of logic itself. Fuzzy sets and Fuzzy logic do just.
In classical set theory an item, say a is either member of set A or it is not. So a meal at a restaurant is either expensive or not expensive and a value must be provided to delimit set membership. Clearly, however, this is not the way we think in real life. While some sets are clearly defined (piece of fruit is either an orange or not an orange), others are not (qualities such as size, speed and price are relative).
Fuzzy set theory extends a classical set theory to accommodate the notion of degree of set membership. Each item is associated with a value between 0 and 1, where 0 indicates that it is not a member of the set and 1 that if in definitely a member. Values in between indicate a certain degree of membership.
For example, although we may agree with the inclusion of cars, Honda, and Maruti in the fast (car) {using logic}, we may wish to indicate that one is faster than the other. This can be possible in Fuzzy set theory.
Here the value in the bracket is a degree of set membership. Fuzzy logic is similar in that it attaches a measure of truth to facts. A predicate P is given. Value between 0 and 1 (as in fuzzy sets).
So the predicate fast (car) can be represented as:
Fast car (Honda 1.5 = 0.9)
Standard logic operators such as AND, OR, NOT are applicable and interpreted as:
Demster Shaffer Theory
Demster Shaffer Theory :
Deals with the distinction between uncertainty and ignorance. Rather than computing the probability of a proposition it computes the probability that the evidence supports the proposition. This measure of belief is called a belief function, Bel (x).
In Bayesian network technique, degree of belief assigned to a proposition with given evidence, is a point, whereas in D.S. theory we consider sets of propositions and to each set we assign an interval [Belief, Plausibility].
in which the degree of belief must lie. We use probability density function m and for all subsets of ɸ for an exhaustive universe of mutually exclusive hypotheses Θ (called frame of discernment) and for all subsets of ɸ. Belief (represented as Bel) measures the strength of evidence in a set of propositions. This lies in the range 0-1 when no evidence means belief 0 and certainty: means belief is 1.
Plausibility (PI) is given by
PI (S) = 1- Bel (∼ S)
where S is set of proposition
In particular if we have certain evidence in favour of ∼ S then Bel (∼ S) will be 1.
and PI (S) is zero
and Bel (S) is also zero.
The belief-plausibility interval defined above measures not only our level of belief in some set of proposition but also the amount of information we have. This can be illustrated with the help of an example.
Suppose a person with doubtful character (shady character) approaches you to bet for Rs.500, telling that on the next flip on his coin heads would appear. You think for a moment that coin may not be fair. You have no evidence on the fairness or unfairness of the coin. Then as per the D-S theory, with no evidence whatsoever (on coin being fair or unfair)
Bel (Heads) = 0
Bel (−Heads) = 0
That is D−S theory has no intuitive faculty.
Now suppose you consult an expert of coins and he assets that with 90% certainty the coin is fair so you become 90% sure that P (Heads) = 0.5.
Now D-S theory gives Bel (Heads) = 0.9 x 0.5 = 0.45
and similarly bel (∼ Heads) = 0.45
There is still 10% point gap which is not accounted for by the evidence. Actually dempster rule shows how to combine evidence to give new value for belief and shafer’s work extends this into a complete computational model.
Since D-S theory deals not with point probability but with probability interval, the width of the interval might be an aid in deciding when do we need to acquire evidence.
In the present example before acquiring the expert’s testimony probability interval for Heads is [0, 1] and this gets reduced to [0.45, 0.55] after the expert’s testimony is received for the coin (information about fairness of coin).
However, there are no clear guidelines for how to do this and there is no clear meaning for what the width of an interval means. For example, knowing whether the coin is fair would have a significant impact on the belief that it will come up heads and detecting an asymmetric weight would have an impact on the belief that the coin is fair.
Consider another example:
Diagnosis problem, as a case of exhaustive universe of mutually exclusive hypotheses. This can be called the frame of discernment, written as Q.
This may be consisting of the set (Alg, Flu, Coe, Pne) following symptoms, where:
Alg – Allergy
Flu – Flue
Coe – Cold
Pne – Pneumonia
Probability density function m is defined not for elements of Q but for all subsets of it is for ɸ and all subsets a m (p) is the amount of belief in the subset (p) of there is no prior evidence in is 1.0.
But when it becomes known through an evidence (at level of 0.6) that the correct diagnosis is in the set {Flu, Col, Pne}
then m gets updated as
{Flu, Col, Pne} (0.6)
{Θ} (0.4)
that is belief is assigned to set of symptoms {Flue, Col, Pne} the remainder of belief still continues to be in the layer set Θ.
Thus, in order to be able to use m and belief and plausibility in D-S. Theory we define functions which enable us to combine m’s from multiple sources of evidence.
Our goal is to attach some measure of belief, m, to the various subsets Z or Q. m is sometimes called the probability density function for a subset of Q. Realistically not all evidence is directly supportive of individual elements of Q. In fact, evidence most often supports different subsets Z of Q.
In addition, since the elements of Q are assumed mutually exclusive evidence in favour of some may have an effect on our belief in others. In purely Bayesian section, we address both of these situations by listing all the combinations of conditional probabilities. In the D-S theory, we handle these interactions by directly manipulating the sets of hypotheses.
There are 2n subsets of Q. We must assign m so that the sum of all the m values assigned to the subsets of Q is 1. Although dealing with 2n values may appear intractable, it usually turns out that many of the subsets will never need to be considered because they may have no significance in the problem domain; so their associated value m may tend to be zero to 3 above.
D-S theory is an example of an algebra supporting the use of subjective probabilities in reasoning, as compared with the objective probabilities of Bayes. In subjective probability theory, we build a reasoning algebra, often by relaxing some of the constraints of Bayes. It is sometimes felt that subjective probabilities better reflect human expert reasoning.
So we conclude uncertain reasoning by saying that D-S allows us to combine:
(i) Multiple sources of evidence for a simple hypothesis.
(ii) Multiple sources of evidence for different hypothesis.
Ways of Dealing Reasoning with Uncertain Knowledge
Ways of Dealing Reasoning with Uncertain Knowledge:
We looked at knowledge and considered how different knowledge representation schemes allow us to reason. Recall, for example that standard or classical logics allow us to infer new information from the facts and rules which we have. Such reasoning is useful in that it allows us to store and utilise information efficiently (we do not have to store everything).
However, such reasoning assumes that the knowledge available is complete (or can be inferred) and correct and that it is consistent. Knowledge added to such systems never makes previous knowledge invalid. Each new piece of information simply adds to the knowledge.
This is monotonic reasoning. Monotonic reasoning can be useful in complex knowledge bases since it is not necessary to check consistency (as required in expert systems) when adding knowledge or to store information relating to the truth of knowledge. It therefore saves time and storage.
However, if knowledge is incomplete or changing an alternative reasoning system is required. There are a number of ways of dealing with uncertainty.
We shall consider some of them, briefly:
1. Dempster shaffer theory
2. Fuzzy reasoning.
MYCIN EXPERT SYSTEM
MYCIN:
Historically, the MYCIN system played a major role in stimulating research interest in rule-based expert systems.
MYCIN is an expert system for diagnosing and recommending treatment of bacterial infections of the blood (such as meningitis and bacteremia). It was developed at Stanford University in California, USA, in the 1970s, and has become a template for many similar rule based systems. It is intended to support clinicians in the early diagnosis and treatment of meningitis, which can be fatal if not treated in time.
However, the laboratory tests for these conditions take several days to be completed, so doctors (and therefore MYCIN) have to make decisions with incomplete information associated with medical knowledge. MYCIN incorporated a calculus of uncertainty called certainty factors which seemed (at that time) to fit well with how doctors assessed the impact of evidence on the diagnosis.
This system was able to perform as well as some experts and considerably better than junior doctors. A consultation with MYCIN begins with requests for routine information such as age, medical history and so on, programming to more specific questions as required.
MYCIN’S expertise knowledge lies in the domain of bacterial infections. MYCIN’S pool of knowledge consists of approximately 500 antecedent-consequent rules, implemented in LISP which give MYCIN the ability to recognise about 100 causes of bacterial infections. Physicians usually begin antibiotic treatment for patients who have bacterial infections without knowing exactly which organism is the culprit.
There is no time to wait for definitive laboratory culture evidence, which accumulates too slowly. For the desperately sick, therapy must begin at once, not 2 days after.
This requirement leaves two choices:
i. The physician can either prescribe a broad-spectrum drug which covers all possibilities, or
ii. He can prescribe a better, disease specific drug.
MYCIN helps the physician to prescribe disease-specific drugs. MYCIN informs itself about particular cases by requesting information from the physician about a patient’s symptoms, general condition, history, and laboratory-test results. At each point, the question MYCIN asks is determined by MYCIN’S current hypothesis and the answer to all previous questions. Thus, the questions start as though taken from a checklist, but the questions then vary as evidence builds.
Here is an excerpt from a sample session:
1. What is the patient’s name?
Arun Sehgal
2. Male or Female?
Male
3. Age?
55
4. Have you obtained positive cultures indicating general type?
Yes.
5. What type of infection is it?
Primary bacteria.
6. When did symptoms first appear?
May 5, 2006
7. Let’s call the most recent positive culture C1. From what site was C1 taken?
From the blood.
8. When?
May 9, 2006
9. Let’s call the first significant organism from this culture U1. Do you know the identity of U1?
No.
10. Is U1 a rod or a coccus or something else?
Rod.
11. What in the gram stain of U1?
Gram-negative.
12. Have there been positive cultures from sites which are not associated with the infections about which you are seeking advice?
No.
13. Did U1 grow aerobically?
Yes.
When MYCIN is satisfied that it has a reasonably good grasp of the situation, MYCIN announces its diagnosis:
My recommendation will be based on the opinion that the identity of U1 may be:
1. Pseudomonas-aeruginosa
2. Klebsiella-pneumonise.
3. E.coli
4. Bacteroides-fragilis.
5. Enterobacter.
6. Proteus-nonmirabilis.
After asking a few more questions about the patient, MYCIN announces its recommendation:
1. On a scale of 0 to 4, where the higher number indicates greater severity, how would you rate the patient’s degree of sickness?
2. Does the patient have a clinically significant allergic reaction to any antimicrobial agent?
3. How much does the patient weigh?
4. My first recommendation is as follows: to cover for items 1, 2, 3, 5 and 6, give gentamycin using a dose of 119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days. Modify dose in renal failure. Also, to cover for item 4, give clindamycin using a dose of 595 mg (8.5 mg/kg) q6th IV [or IM] for 14 days. If diarrhea or other gastrointestinal symptoms develop, check for pseudomembranous colitis.
It has the following organisational features:
i. Knowledge Representation:
In the form of production rules implemented in LISP. K.B. is of the form – If condition, and …. and condition hold then draw conclusion, and …. and conclusion encoded in the data structure of LISP programming.
ii. Reasoning:
Backward chaining, and forward reasoning uses certainty factors to reason with uncertain information. MYCIN uses backward chaining to discover what organisms where present. Then it uses forward chaining to reason from the organism to a treatment regime.
iii. Heuristics:
When the general category of infection has been established, MYCIN examines each candidate diagnosis in a depth-first manner. Heuristics are used to limit the search, including checking all premises of a possible rule to see if anyone of these is known to be false.
iv. Dialogue/Explanation:
The dialogue is computer controlled, with MYCIN driving the consultation through asking questions. Explanations are generated by tracing back through the rules which have been fired. Both “how? ” and “why? ” explanation are supported.
CERTAINTY FACTOR EXAMPLE
Let’s consider a concrete example (of a home-lawn) in which:
S: sprinkler was left on last night
W: grass is wet in the morning
R: it rained last night
We can write MYCIN-style rules which describe predictive relationships among these three events:
R1: If the sprinkler was left on last night
then there is suggestive evidence (0.9) that
the grass will be wet this morning
Taken alone, R1 may accurately describe the world. But now consider a second rule:
R2: If the grass is wet this morning
then there is suggestive evidence (0.8) that it rained last right
Taken alone, R2 makes sense when rain is the most common source of water on the grass. But, if the two rules, are applied together, using MYCIN’s rule for chaining,
we get
MB [W, S] = 0.8 [sprinkler suggests wet)
MB [R, W] = 0.8. 0.9 = 0.72 [wet suggests rains]
In other words, we believe that it rained because we believe the sprinkler was left on. We get this despite the fact that if the sprinkler is known to have been left on and to be the cause of the grass being wet, then there is actually almost no evidence for rain (because the wet grass has been explained due to sprinkler).
One of the major advantages of the modularity of the MYCIN rule system is that it allows us to consider individual antecedent/consequent relationships independently of others. In particular, it lets us talk about the implications of a proposition without going back and considering the evidence that supported it.
Unfortunately, this example shows that there is a danger in this approach whenever the justifications of a belief are important to determining its consequences. In this case, we need to know why we believe the grass is wet (because we observed it to be wet as opposed to because we know the sprinkler was on) in order to determine whether the wet grass is evidence for it having just rained.
A word of caution; this example illustrates a specific rule structure which almost always causes trouble and should be avoided. Our rule R, describes a causal relationship between wetness and sprinkler (sprinkler causes wet grass). The rule R1 although, looks the same, actually describes an inverse causality relationship (wet grass is caused by rain and thus is evidence for its cause).
We can form a chain of evidence from cause of an event:
But the evidence should not be used to look for the cause or symptom of an event without any new information. To avoid this problem, many rule-based systems either limit their rules to one structure or clearly partition the two kinds so that they cannot interfere with each other. The Bayesian network suggest a systematic solution to this problem.
We can summarise this discussion of certainty factors and rule-based systems which is very useful but will be appreciated only after you are done with the expert systems. The approach makes strong independence assumptions which make it relatively easy to use; at the same time assumptions create dangers if rules are not written carefully so that important dependencies are captured.
The approach can serve as the basis of practical application programs. It did so in MYCIN. It has also done so in a broad array of other systems which have been built in the EMYCIN platform, which is a generalisation (often called a shell) of MYCIN with all the domain-specific knowledge expressed, rules stripped out. One reason that this framework is useful, despite its limitations, is that it appears that in an otherwise robust system the exact numbers which are used do not matter very much.
The other reason is that the rules were carefully designed to avoid the major pitfalls we have just described. One other interesting thing about this approach is that it appears to mimic quite well the way people manipulate certainties.
CERTAINTY FACTOR WITH MYCIN SYSTEM
Bayesian reasoning assumes information is available regarding the statistical probabilities of certain events occurring. This makes it difficult to operate in many domains. Certainty factors are a compromise on pure Bayesian reasoning.
The approach has been used successfully, most notably in the MYCIN expert system. MYCIN is a medical diagnostic system which diagnoses bacterial infections of the blood and prescribes drugs for treatment. Here we present its uses as an example of probabilistic reasoning. Its knowledge is represented in rule form and each rule has an associated certainty factor.
For example, a MYCIN rule looks something like this if:
(a) The gram stain of the organism is gram negative,
(b) The morphology of the organism is rod, and
(c) The aerobicity of the organism is anaerobic then there is suggestive evidence (0.5) that identity of the organism is Bacteroides.
Or
If:
(a) The stain of the organism is gram-positive,
(b) The morphology of the organism is coccus,
(c) The growth conformation of the organism is clumps, then there is a suggestive evidence (0.7) that the identity of the organism is staphylococcus.
This knowledge in the form of rules is represented internally in easy-to- manipulate LISP list structure:
Premise:
($AND (SAME CNTXT GRAM GRAMPUS)
(SAME CNTXT MORPH COCCUS)
(SAME CNTXT CONFORM CLUMPS)
Action:
(CONCLUDE CNTXT IDENT STAPHYLO COCCUS TALLY 0.7)
The interpretation can be postponed till the reader becomes familiar with LISP.
MYCIN uses these rules to reason backward to the clinical data available from its goal of finding significant disease-causing organisms. Once it finds the identities of such organisms, it then attempts to select a therapy by which the disease(s) may be treated.
In order to understand how MYCIN exploits uncertain information, we need answers to two questions:
“What do certainty factors mean?” and “How does MYCIN combine the estimates of uncertainty in each of its rules to produce a final estimate of the certainty of its conclusions?” A further question that we need to answer, given our observations about the intractability of pure Bayesian reasoning is, “What compromises does the MYCIN technique make and what risks are associated with these compromises?” We answer all these questions now,
A certainty factor (CF [h, e]) is defined in terms of two components:
1. MB [h, e] − a measure (between 0 and 1) of belief in hypothesis h given the evidence e. MB measures the extent to which the evidence supports the hypothesis. It is zero if the evidence fails to support the hypothesis.
2. MD [h, e] − a measure (between 0 and 1) of disbelief in hypothesis h given the evidence e. MD measures the extent to which the evidence supports the negation of the hypothesis. It is zero if the evidence supports the hypothesis.
From these two measures, we can define the certainty factor as:
CF [h, e] = MB[h, e] – MD[h, e]
Since any particular piece of evidence either supports or denies a hypothesis − (but not both), and since each MYCIN rule corresponds to one piece of evidence (although it may be a compound piece of evidence), a single number suffices for each rule to define both the MB and MD and thus the CF.
The CF’s of MYCIN’S rules are provided by the experts who write the rules. They reflect the expert’s assessments of the strength of the evidence in support of the hypothesis. As MYCIN reasons, however, these CF’s need to be combined to reflect the operation of multiple pieces of evidence and multiple rules applied to a problem. Fig. 7.4, illustrates three combination scenarios which we need to consider.
In Fig. 7.4(a), several rules all provide evidence which relates to a single hypothesis. In Fig. 7.4(b), we need to consider our belief in a collection of several propositions taken together. In Fig. 7.4 (c) output of one rule provides the input to another.
What formulas should be used to perform these combinations?
Before we answer that question, we need first to describe some properties which we would like the combining functions to satisfy:
1. Since the order in which evidence is collected is arbitrary, the combining functions should be commutative and associative.
2. Until certainty is reached, additional confirming evidence should increase MB (and similarly for disconfirming evidence and MD).
3. If uncertain inferences are chained together, then the result should be less certain than either of the inferences alone.
Having accepted the desirability of these properties, let’s first consider the scenario in Fig. 7.4 (a), in which several pieces of evidence are combined to determine the CF of one hypothesis.
The measures of belief and disbelief of a hypothesis given, two observations S1 and S2are computed from:
One way to state these formulas in English is that the measure of belief in h is 0 if h is disbelieved with certainty. Otherwise, the measure of belief in h given two observations, is the measure of belief given only one observation plus some increment for the second observation. This increment is computed by first taking the difference between 1 (certainty) and the belief given only the first observation.
This difference is the most which can be added by the second observation. The difference is then scaled by the belief in h given only the second observation. A corresponding explanation can be given, then, for the formula for computing disbelief. From MB and MD, CF can be computed. However, if several sources of corroborating evidence are pooled, the absolute value of CF will increase. If conflicting evidence is introduced, the absolute value of CF will decrease.
Example:
Suppose we make an initial observation corresponding to Fig. 7.4 (a) which confirms our belief in h with MB = 0.3. Then MD [h, s1] = 0 and CF [h, s1] = 0.3 Now we make a second observation, which also confirms h, with MB [h, s2] = 0.2.
We can see from this example how slight confirmatory evidence can accumulate to produce increasingly larger certainty factors.
Next let’s consider the scenario of Fig. 7.4(b), in which we need to compute the certainty factor of a combination of hypotheses. In particular, this is necessary when we need to know the certainty factor of a rule antecedent which contains several clauses (as, for example, in the staphylococcus rule given above). The combination certainty factor can be computed from its MB and MD.
MYCIN uses for the MB the conjunction and the disjunction of two hypotheses as given below:
MD can be computed analogously.
Finally, we need to consider the scenario in Fig. 7.4(c), in which rules are chained together with the result that the uncertain outcome of one rule provides the input to another. The solution to this problem will also handle the case in which we must assign a measure of uncertainty to initial inputs.
This could/o r example, happen in situations where the evidence is the outcome of an experiment or a laboratory test whose results are not completely accurate. In such a case, the certainty factor of the hypothesis must take into account both the strength with which the evidence suggests the hypothesis and the level of confidence in the evidence.
MYCIN provides a chaining rule which is defined as follows. Let MB'[h, s] be the measure of belief in h given that we are absolutely sure of the validity of s. Let e be the evidence which led us to believe in s (for example, the actual reading of the laboratory instruments or the results of applying other rules).
Then:
Since initial CFs in MYCIN are estimates which are given by experts who write the rules, it is not really necessary to state a more precise definition of what a CF means than the one we have already given.
The original work did, however, provide one by defining MB (which can be thought of as-a proportionate decrease in disbelief in h as a result of e) as:
In turns out that these definitions are incompatible with a Bayesian view of conditional probability. Small changes to them, however, make them compatible.
In particular, we can redefine MB as:
The definition of MD should also be changed similarly.
With these are interpretation, there ceases to be any fundamental conflict between MYCIN’s techniques and those suggested by Bayesian statistics. The pure bayesian statistics usually leads to intractable systems. In MYCIN rules CF represents the contribution of an individual rule to MYCIN’s belief in hypothesis. In a way this represents a conditional probability.
But in Bayesian P (H|E) describes the conditional probability of H given under the only evidence available, E (and joint probabilities in case of more evidences). Thus, Bayesian statistics is easier but not without a fallacy. But MYCIN does work with a greater precision.
The MYCIN formulas for all three combination scenarios of Fig. 7.4, make the assumption that all rules are independent. The burden of guaranteeing independence (at least to the extent that it matters) is on the rule writer. Each of the combination scenarios is vulnerable when this independence assumption is violated.
This can be analysed by reconsidering the scenario in Fig. 7.4(a). Our example rule has three antecedents with a single CF rather than three separate rules; this makes the combination rules unnecessary. The rule writer did this because the three antecedents are not independent.
To see how much difference MYCIN’s independence assumption can make, suppose for a moment that we had instead three separate rules and that the CF of each was 0.6. This could happen and still be consistent with the combined CF of 0.7 if the three conditions overlap substantially. Combining uncertain rules.
If we apply the MYCIN combination formula to the three separate rules, we get:
This is a substantially different result than true value, as expressed by the expert, of 0.7.
Now let’s consider what happens when independence of assumptions is violated in the scenario of Fig. 7.4(c).
Subscribe to:
Posts (Atom)