What's in a Link: Difference between revisions
No edit summary |
|||
(89 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<TABLE><TR><TD VALIGN=TOP> | |||
__NOTOC__ | |||
<TD> </TD><TD VALIGN=TOP> | |||
Recording Date: 3/9/2010 | |||
Location: MIT Cambridge, MA | |||
Slides: [http://files.meetup.com/1336198/semantic-odyssey-2010.pdf PDF] | |||
==What's in a Link, Revisited - William A. Woods== | |||
William | In this Lotico session William discusses ideas about representing meaning in computer representations based on his now classical paper "What's in a Link: Foundations for Semantic Networks" - 1975 and his review "Meaning and Links " in 2007 in a historical context. | ||
William A. Woods is a Distinguished Software Engineer at ITA Software, Cambridge, MA and a member of the Semantic Web Meetup Group. http://parsecraft.com | |||
Foundations for Semantic Networks | |||
<center><youtube>vlgVYRXe68Q </youtube></center> | |||
Watch Video on: [http://vimeo.com/14459415 Vimeo] | |||
==Transcript== | |||
Ok so, Marco asked me to tell you something about this paper and why I wrote it, what was going on at the time, who I was trying to address and I will cover also some of the things that have followed since. So, before I did the paper, I worked on question answering. This all got started at a seminar at Harvard where {Susumu Kuno} became my thesis supervisor, suggested: "How would you do a natural language question answering system for a database?" And I said to myself: "Well, if you're gonna do that, you ought to know something about meaning." So I went off and read a lot of philosophy and took a lot of linguistics courses and it was really subtle, people had, at the time, syntax a little bit but meaning was really mysterious. The best I could come up with from the philosophy literature is this quote from [http://en.wikipedia.org/wiki/Rudolf_Carnap Carnap]: "To know the truth condition of a sentence is to know what is asserted by it, in usual terms its 'meaning' ." So the philosophers essentially say a term has a meaning based on some ability to decide when it's true and when it's not. That's what meaning is for them and when they do it, the truth conditions are an abstract mapping from possible worlds onto truth values. Instantiations of all possible predicates on all possible objects into truth values. And I said to myself how can this infinite thing be represented finitely in a reasoning head? And the only thing I know that can do that sort of thing is an abstract procedure, something like a computer program, a Turing machine, a Post production system. Something of that sort. So I proposed a theory of semantics, which I called procedural semantics, where the meaning is defined by abstract procedures for determining reference, verifying facts, computing values, including truth values and carrying out actions. And it's build on top of, not set theory or category theory or any sort of things that have subtle complexities in their foundations, but things that we understand pretty concretely like conditional expressions and computation of a value and "if then" and "while loops" and so forth, which I maintain is a much more solid foundation than set theory or category theory or even the foundations of real variables in mathematics. And it provides a principled connection between mental symbols, that this reasoning engine is carrying in its head, and the things that are out there in the world that they actually denote or mean. In fact this is the first semantic theory in history that provides a principled, causal relationship between meanings in the head and concrete objects in the physical world. And it was practical. I applied it to a task that I was set by a guy who worked for the NASA Manned Spacecraft Center, who had collected all of the first years work the Lunar rocks into a database. And he could get his [http://en.wikipedia.org/wiki/Fortran Fortran] programmer to write a question to answer a question if some scientist wrote him one and he wanted to know if he could get his programmer out of that loop. And at the time I had developed a theory of semantics and a system for doing this, so I got a contract and I build a system that actually answered questions on the first years worth of the Apollo 11 moon rocks. It was demonstrated live at the Second Annual Lunar Science Conference {1971} and it answered questions that people came up to it and typed in. And it did reasonably well at it, it was a very interesting system. And the interesting thing about it is that its meaning representation language is an extension of the predicate calculus with generalized quantifiers, quantifiers that include imperative operations and actual calculations and non standard things, not only the I/O operator, but also things like a prime number of objects, or more than a certain amount of objects etc. The basic structure was for some quantifier, governing a variable in some class such that some condition is true, do an action. It just looks like [http://en.wikipedia.org/wiki/ALGOL ALGOL], right? | |||
For some quantified variable in a class that so the condition is true, test some other condition. And there is a TEST function and a PRINTOUT function, and with this process you can hook up to any kind of database no matter how it's structured. You have an interface from your natural language processing to this universal, procedural, operational-based meaning representation language. And then you define the primitives in this language with functions. Functions that can access the database or in extreme cases can go out in a warehouse and count bolts in a bin. So it's an extremely powerful semantic framework. Ok so, this theory permits a computer to understand in a single, uniformal way the meanings of conditions to be tested, questions to be answered and actions to be carried out. And it permits a very general purpose system for language understanding that can be used with lots of different databases and more importantly can actually cross databases and produce queries that integrate information from multiple databases in a uniform semantic framework. Without having to get those databases into some common paradigm. And as I said it can actually perceive an act on objects in the external world. And it can perceive and reason about these meaningful, meaning objects themselves, which are procedures as some kind of program inside the machines head. Ok so, there is a semantics, a theory of what things mean, but it doesn't directly address some of the things which you like to do with your meaning in the meaning representation. It's quite powerful, it's very expressive, it can compute things, but it doesn't have a bunch of associative things that you and I have in our heads. That let you go from one node to another node and pick up new things. So what I wanted was a system that was comparably well-founded, comparably expressive but also had the associativity that we humans have for following facts from one thing to another and has the capability of supporting the reasoning operations you need to do in a way that's efficient and scales. | |||
So, I looked at Semantic Networks. And a Semantic Network is essentially a network of concepts connected by links used for some kind of associative access. And again I had the same question what do these links mean and what do they have to do with semantics? And I concluded pretty quickly that they had nothing to do with semantics but I finally inured myself to calling it that nonetheless because that's what everyone called it. And the kinds of things people where doing at the time were, lots of them were psychologists and they were looking to try to mirror the things that the human mind does that are associative. And one example from Collins and Quillian, Collins is a psychologist that was at [http://en.wikipedia.org/wiki/BBN_Technologies Bolt, Beranek and Newman], Ross Quillian is ostensibly the guy you coined the phrase 'Semantic Network', although arguably it was used by somebody else { [http://en.wikipedia.org/wiki/Margaret_Masterman Margaret Masterman] at Cambridge University (1961) } in England before he did. But he looked at statements like: a bird is an animal; animal has skin; a Canary is a bird; a bird has feathers and measured how long it took people to answer questions like this: "Does a canary have feathers?" or "Does a canary have skin?." And his theory was that this is coming from inheritance in the Semantic Network. And a Canary is a bird, which is an animal, and we know that animals have skin, and birds have feathers, having feathers is a more specific kind of fact than having skin and therefore it takes a little longer to find that a Canary has skin than it takes to figure out that it has feathers because the path is longer. And they actually made psychological experiments and measured the time that correlated with these predictions. I don't think the theory was ... well psychologist have a tendency to jump to the conclusion if the data is consistent with the theory then the theory is right without asking are there alternative theories that could explain the same data. | |||
So, at any rate, I saw people all around me using notations that had links and had labels on them, had nodes that had labels on them and calling them 'Semantic Networks' and doing all kinds of things with them. And I thought they're not doing anything with meaning and you can't really use these representations to do anything unless you become more rigorous about them. At that time there was a conference that was organized down at [http://en.wikipedia.org/wiki/Woods_Hole,_Massachusetts Woods Hole] and I went there and heard people like [http://en.wikipedia.org/wiki/Roger_Schank Roger Schank] and a bunch of other people talking about what they were doing. And I kind of spouted off on all these problems that I saw. And [http://en.wikipedia.org/wiki/Daniel_G._Bobrow Danny Bobrow] took me and holed me up in the basement of Xerox Parc with a Bravo machine and got me to write it all down and that become the "What's in a Link" paper, which ended up being dedicated to [http://en.wikipedia.org/wiki/Jaime_Carbonell Jaime Carbonell], who was my boss at the time and who unfortunately died tragically just passed away while driving down the street. Fortunately he pulled over to the side of the road before he crashed with his wife. So, the book was dedicated [http://en.wikipedia.org/wiki/Jaime_Carbonell Jaime], and in some sense this paper was dedicated to him. So here are some of the things that I saw going all around me, people said the meaning is just all the concepts that are connected to it in this web. "That's what the meaning is!" Well there might be some truth to it but it doesn't give you any meat to start with. "Whatever the system does with its input is its semantics!" Again there may be some element of truth to that but that's not what semantics is really about. "There is no difference between syntax and semantics!" this was one of [http://en.wikipedia.org/wiki/Roger_Schank Roger Schank's] claims. He illustrated it with the example: "I saw the Grand Canyon flying to the New York", which I take as proof that he is wrong. Because if he was right you would just understand what that meant, and nobody would laugh. It's the fact that syntax gives you the wrong model first that makes it fun, that makes it interesting and surprising. And then there is "Semantics is in the eye of the beholder!" this is actually pretty accurate for the Semantic Networks that people are actually using. The semantics is all in the words that you put on the labels. Ross Quillian at one point had a semantic network system implemented in [http://en.wikipedia.org/wiki/LISP LISP] and he decided to remove the labels and the resulting network, when printed out, was just a giant nest of parentheses. So it's right, the semantics was all in the labels. So, in the paper I characterised two populations of people that address semantics, and I called one the linguist and the other the philosopher. Somewhat unfair, not everybody is in these centroids, but these are caricatures. The linguist is principally interested in a way to represent the different meanings of a sentence, that's fundamentally ambiguous. So they are looking for some notation in which if a sentence has two readings I can write down the two readings and say this is this one and this is that one, here is why they are different. And if a sentence has no meaning because it's nonsense of some sort they want to some criteria that lets them say that's nonsense because of this. The philosopher on the other hand is interested in the truth conditions of predicates which he postulates in his model already as unambiguous, because he has defined them. He is interested in soundness and completeness of logical deduction systems etc. These are two parts of the problem, but even put together they don't cover the whole problem. So, the linguist is interested in thing like: "I saw the man from the park" it has got two possible interpretations "I was in the park and I saw him from there" and the other is "He is from the park and I saw him in the deli or wherever", they are very fond of the asterisk which use to mark a sentence as ungrammatical. And they would say things like "The amoeba hit John with a hammer" is ungrammatical because of semantic features. The instrumental 'with' construction used here requires that its agent be +agent. And amoebas are not marked +agent because they don't have brains. So, they call that semantics but the machinery they use to do this theory is the fundamentally same kind of theory you would use if it were syntactic. So they are basically using syntactic mechanisms. It legitimately would be called semantic if you actually followed amoeba out some reference of what amoeba meant. And from your knowledge of real amoebas you can infer that they can't use hammers - except on [http://en.wikipedia.org/wiki/Star_Treck Star Trek], you know, where maybe they're really big and they can. So, I was found the interesting things about these to be in what context, what will have to be true in that context in order for this to be a sensible utterance. And if you're actually constructing that entire model than you are doing something semantic and there would be a semantic feature. But the mechanisms they had in mind didn't go that much. OK so, the philosophers are interested in reasoning, the philosophers perspective is: a model consists of a set of assignments of truth values to every possible instantiation of every predicate over a universe of individuals. And this abstract model is like the idealised point mass on frictionless surface that physicists use. This is the thing that they created in order to proof that the reasoning systems were sound and complete. So with this as an abstract model you can prove that if I have a reasoning system, a set of rules, a set of steps for deducing things, one from another and I can show that this reasoning system assigns validity only to those things that a true in every possible model and if I can show it only assigns validity to things that are true in every possible model then I got a good reasoning systems. And that was an extremely productive thing to do, and based on that they were able to show that first order logic is something complete but problematic, and set theory got issues and actually the foundations of mathematics are not all that foundational and so a lot came of it. So what's still missing though, if we go back to the dictionary definition of semantics: is the relationship between symbols and what they denote or mean. And if you think about what the logician did with their model theory, they gave us a very good account of the semantics of IF THEN, AND, OR and NOT and the quantifiers. But they don't do anything for the semantics of snow is white. And that requires you to have something else. And the something else I proposed was procedural semantics. Snow is white, you have to know what snow is. OK, if you can build a robot and he can say this is snow, and he says it properly, and he can measure the color spectrum, and can say this is white and he can do that properly then he can test whether snow is white or not, than you actually got the semantics that can really do that. So the meaning of a proposition is a procedure for determining if it is true, the meaning of an action is the ability to do it and/or tell if it has been done, and as I said this can provide a principled sensorimotor connection to the actual physical world. | |||
Ok, back to links. So, people have used links to represent attribute and value. That's what most of you think of in an RDF network, but there are other things like links between attributes and some predicate that supposed to be true of their value. Or relations in your objects etc. So let me walk you through some of the examples from the "What's in a Link" paper. We start with something very simple: | |||
<br><br> | |||
John height 6 feet<br> | |||
hair color brown<br> | |||
occupation scientist<br> | |||
<br> | |||
This is the kind of data you're all accustomed to seeing and thinking about. And then I said well what happens if you don't know his height, but you know about his height: he is greater than six feet. A lot of people wouldn't hesitate to just go ahead and put "greater than six feet" into the value slot. But if you do that, you are really doing something totally silly or you've made a huge mental reinterpretation of what your entire system is doing. The things at the end of links are no longer values now. They are predicates that are true of values. And you can think of those values you had before as little identity predicates that say that identity is true of this value. So you can actually make this leap, do it consistently and have a new consistent model. But it's a little more subtle than the one you thought you had. And now it handles all the stuff it handled before plus it can handle these predicates, so you can put in constraints on values that you only know a little bit about. This is a very important thing when you're collecting data, because we all had the experience trying to fill out some form and at some point the options the form gave you didn't include the right one. I used to use a cartoon of the guy who absolutely had no hair looking at the hair color choice on the driver license application. My favorite one was a database that Xerox had of copiers, and the copier database had two addresses. One says where the copier is so the serviceman can go and fix it. And the other one says the billing address, it says where you send the bills so you get the money. Well they had a copier that was installed on a barge that went up and down on the east coast on a regular schedule, oh oh! You'd like to be able to put that into your database. You like to be able to tell that story, capture it somehow and express it as a constraint on the value. And you don't always have the real value. Ok, what if you don't know either John's or Sue's height but you know about a relationship between the two. Now the transformation I just made doesn't quite do that yet, so now you wanna say something like: height of John is greater than the height of Sue. So now, maybe what you do is you take the link from John named height and you say that thing at the end of that link is not a value, and it's not a predicate, it's a role. It represents the thing that is John's height. And now you can say: oh here is the height of John, here is the height of Sue, and now I can assert a relationship between those two that says this one is greater than that one. So again we've been able to make a backwards-compatible extension of our semantic model if you really careful about it, much more subtle than we thought even the time before, but it still works. But then what do we do with this relationship of greater that we just put in there? This no longer an attribute value, we're asserting a fact by putting a link in. We were sort of asserting facts when we put links in before, but we were thinking of this as attributes, we thinking of them now as pointers to roles so were not really saying that there's a hit role that's equal to marry on John that's the way we'll have to interpret it if we were using the model we just did on the last slide. That's not what we were intending to say were intending to assert that there is a hit relationship between John and Mary just like we asserted that there is greater relationship between the two height. And sometimes you see people, I saw people, putting both of these kinds of things in the same slide, in the same picture in the paper. With nothing to tell the reader, or certainly not an engine that was trying to look at this, you gotta treat one of them one way and the other in another way. And then there were case representations. The linguists were using this to capture the generalization that some things in English, some arguments, the verbs are characterized with propositional phrases like "from" and "to", and some of them are characterized in other languages with case inflections that just tag the word but don't have an actual operator and there is the subject and the object, so they all say the same kind of thing. They're what we call case. It's basically a slot in today's terminology, it's an attribute. So they would represent something like "John sold Mary a book" as an instance of "sell", who's agent is John, who's recipient is Mary and who's patient is a book. Well, now we've use these links not to make assertions but to build up the pieces of a complicated thing. So we've got an instance of "selling" the has three parts: one is an agent, one is an recipient, one is a book. So we can't do this with the system that we just thought about. We have to rethink it. So now you deal with everything by saying it's an instance of an assertion with arguments and there is a type of the assertion, and values the arguments. And then you find people using on the same page exactly the same structure the schema that defines the data model constraints on the use of this structure. So the agent of a "sell" is person, the recipient is a person and the object is a thing. So I simply went through a sequence of these things and pointed out what you're thinking about has to change depending on your ambition on what you want to express. And finally I came up with the case of some node has an ID has a "superc" link to telephone, has a "color" link to black. And the question is, is this the description of a black telephone or is this an assertion that telephones are black? And without some additional machinery, some bit, some extra piece of notation somewhere in there, you don't know which way this ought to be interpreted. And I pointed out that you also need to deal with this subtle phenomena that the philosophers and logicians have come up with, called intension, with an 's' not a 't'. Which is something that stands between the logical predicate or the word that you use and the actual object in the real world, that constitutes in some senses its meaning or its denotation, the thing that it means. And that the classical example is the Morning Star and Evening Star, which I think Church described as probably discovered by some Babylonian astronomer that these were actually the same physical object in the heavens, but if they meant the same thing you wouldn't have needed an astronomer, a semanticist could have told you. So the point is they mean something different, but they have the same referent. They denote the same thing in the real world, so you have to have these mental entities that have a meaning that's other than the thing that they denote. It has to do with how they carve that thing that they denote out of the world of your perception. So the Morning Star is the one you see really early in the morning and the Evening Star is the one you see really early in the evening and it's a consequence of the world not of the logic or the meaning that they happen to be the same object in a much richer theory. Okay so, the conclusions from "What's in a link" were: we need to have an ability to represent at least the kind of functional equivalent of Lunar's generalized quantification, so that we can be expressive. We need to represent propositions without committing to whether they are true or not. We need to represent descriptions of individuals without commitment to saying that they exist and we need to represent intensional entities like the morning star and the evening star. So, the challenge I have been thinking about for a long time since is, is there a way to get the best of both worlds can we create representational system that is like logical deduction and has the kind of rigor and formalness that logical deduction has, but also has the associative retrieval and intuitive character that associative networks have. In the logical deduction case the steps that you do are like rule instantiations. You're consciously applying rules and when you do it you're aware of it, you know you're doing work and you think about it. On the associative side something is happening likely subconsciously, something is following pointers from thing to thing to thing in your head. Subtle psycho-linguistic experiments can show that something is going on that takes time and sometimes it takes more and sometimes it takes less, but it's largely below our level of consciousness. We don't know how it works, but it does magic things for us. It pops out the word that fits a crossword puzzle slot. It reminds us of the thing we are trying to remember that took us two minutes to come up with and then it's suddenly there. So I've been trying to figure out ways to replicate that kind of reasoning that we do subconsciously, to supplement the kind that we do deductively, and most of what we do is exactly that subconscious instantaneous kind, and it's the kind of thing we have trouble getting computers to do. They are really good at following rules and chaining them together, better than we are. But they are not as good as we are at this other stuff. | |||
So enter the project called [http://en.wikipedia.org/wiki/KL-ONE KL-ONE] that we started at [http://en.wikipedia.org/wiki/BBN_Technologies Bolt, Beranek and Newman] the original goal was to try to create the infrastructure for a reasoning agent that would organize and store everything that agent had to know and | |||
enable you to find things in it had something we called structural inheritance networks that made a distinction between primitive and defined concepts, made a distinction between concepts and roles, had number restrictions and value restriction and most importantly and uniquely it had classification algorithms, most specific subsumer algorithms. By knowing enough about the meanings of our notations we could actually reason that this description is more general than that one, and with that reasoning we could organize everything we knew about generality, and we could ask of a given description what are the most specific things that are more general than that description - those are the 'MS's', and what are the most general things that are more specific than this one - those are the most general subsumes, and you could organize your entire collection of concepts with this kind of structure, and nicely, it seems, after many years working on this you can look things up in that kind of structure with the kinds of log n behavior that you can do with a binary search, and the nice thing about it is this is a partial order, this is a very complicated partial order so binary search doesn't work on it. But with this particular kind of partial order you can get something like it that has similar computational properties. So, the original goals of [http://en.wikipedia.org/wiki/KL-ONE KL-ONE] were to be this organizing structure for a reasoning system. We wanted to efficiently associate things so that you can do associative retrieval. We wanted to support that kind of subconscious reasoning that people do so well. We wanted to support high-level perception the kind of thing you do when you realize what's going in the world and how the pieces relate to each other. We wanted to automatically classify the situation we are in, in terms of all of things we knew so that the most specific things we know that apply to the current situation get found efficiently, and we wanted to have inheritance of attributes and rules all that good stuff. The hypothesis that I made, which I think is accurate is that most of our reasoning consists of what I call recognize and react. We recognize the situation we are in and immediately brought to mind are the things we know about it, what do I need to do about it? Is it good to eat? Is it dangerous? Do I need to run from it? Is there something I should do about it. Deductive reasoning happens much less often. This kind of reasoning happens constantly and even when we're doing deductive reasoning we're using this recognize and react to decide what should be the next step in our deductive chain. So it's the fundamental inter-loop of a reasoning and thinking, and so it's important to be able to recognize these known situations quickly and efficiently. It's something that we've been evolved to do well. So here are some issues. How does a reasoning system find relevant pieces of information when it enters millions of things, how does it acquire and organize all those items, how does it integrate new information into the previous knowledge, and how to does it use this knowledge structure to impose structure on the situation it finds itself and perceive itself in the world? And, again, the answer is taxonomic subsumption, organize the concepts by generality in the conceptual taxonomy, and use this structure to organize all of your rules all your facts and use subsumption algorithms to build it, and to assimilate new things into it and to find things in it when you are using it to function. So for example you would organize all the pattern parts of all your rules into a conceptual taxonomy, classify your current sub-goal into this taxonomy, find the most specific rules that apply to the situation you are in, choose a rule, do what it says, lather, rinse, repeat. So, [http://en.wikipedia.org/wiki/KL-ONE KL-ONE] was unique in the world at its time in having this ability to automatically take a concept and decide where it belongs in the taxonomy and then inherit whatever you would inherit from that point in the taxonomy. In every other semantic network system, frame system, so forth, you consciously put something at some place in the taxonomy and then it would inherit from where you put it. So, this was a fundamentally new kind of thing and these two algorithms, the MSS and the MGS work were what it takes to do it. [http://en.wikipedia.org/wiki/KL-ONE KL-ONE] took tangent at this point. [http://en.wikipedia.org/wiki/Ronald_J._Brachman Ron Brachman] and [http://en.wikipedia.org/wiki/Hector_Levesque Hector Levesque] decided they wanted to make it declarative instead of procedural and so they decided to put it on a foundation of [http://en.wikipedia.org/wiki/First-order_logic first-order logic] and use extensional set inclusion as their model of subsumption and my take is that thereby they inadvertently signed-up to do first-order logic, complete reasoning. So the thread that took off at that point and ultimately culminated in Terminological Logics and [http://en.wikipedia.org/wiki/Description_logic Description Logics] was, if you really constrain the expressive power of your representation you could get a certain class of inferences that it was complete and sound with, but after people all over the world - logicians love this, proved this and that sort of expressive power was intractable in this and that way, the upshot was almost everything of interest was intractable and not really feasible. I was off doing start-ups at this point and not paying a whole lot attention to what they were doing, but when I went back to look and see what had happened they had thrown away all these intuitions about how the structure suppose to support the reasoning and how you suppose to associate something to the other thing and replaced it with big expansions of complicated logical expressions that themselves were getting big. I had always assumed that the numbers that they were intractable with respect to were the size of the taxonomy but I found out they were the size of individual concepts. So, I went and thought and rethought the process and wrote a paper in {1991} called "Understanding subsumption and taxonomy: {A framework for progress}" where I revisited the original goals. I wanted an efficient principled methodology for organizing knowledge and I wanted to handle intensional subsumption, not extensional. That's where they ran into trouble. If you try to guarantee that all the empty sets are equivalent and everything that's remotely similar to that issue, that the Morning Star and Evening Star are the same than you run into trouble, but if you go for intensional rather than extensional you get to define what the intension is and you can define what the criteria for subsumption is and I defined fairly straightforward criteria for intensional subsumption. Every element for parent concept subsumes some element of the child and the necessary conditions for the child entail some set of sufficient conditions for the parent. So that says if you got one of the children and you know it is underneath the parent and it turns out you can get tractable algorithms for this it's not complete in the first-order logic sense of completeness, but that's not what we are looking. We are looking for the thing that can very quickly do a classification of a situation you're in into a set of most specific subsuming situations. It's not, that test doesn't include proving Herbrand's theorem or any of those more complicated subtle things. It's a very important, fundamental inner-loop operation that we have to do fast and that's what the goal is here. Okay so, while doing this I noticed an equivalent to the old "What's in a link" stuff, you see people writing down things like birds have wings, people need vitamins, people live in houses, the person broke a window, but when you look at these links, they have an implicit quantifier in them, everybody needs every vitamin but not every bird has every wing, not every person lives in every house, people typically live in houses, some people don't have houses. The person broke a window is a very specific statement, the others were generics. So I started to catalog and label these things that these links could represent and called them quantificational tags. So AE is the classical for every there exists. So birds have wings is for every bird there are some wings that they have. Person needs vitamins is AA every person needs every vitamin. Children like candy is typically AA but it's not always AA. So, you can, in this fashion, characterize that fact. John lives in Chicago is an instant instance, so the things at the end of the link in this case there's no real quantification is going. That's the sort of identity case and I realized you think of these as relations forming operators. So it's kind of an annotation on an underlying semantic relationship. So AE have is, as in birds have wings, can by characterized by the lambda expression, as a function of X and Y such that for every x x there is just a y and y such that x has y. So you can give a formal definition for each of these quantificational tags and you can actually use these as annotations on RDF Schema to make manifest the hidden quantificational important that somebody buried in a link name. And the reason this is important is because if you wanna write a classifier and the classifiers is going to go and decide is this concept more general than that one, you don't want to have it have to unpack the link and understand its semantics and go figure out if there is a quantifier down in there in order to use it. Okay so the quantificational tags give you a nice clean separation between the stuff that the classifier needs to know, the basic logical quantificational structure and the domain specific relationships which it can still treat as a black box. And it also forms a contract between the person who is entering the data and the way the engine gonna treat the data later. If the person just creates a link called "has children" and the classifier is gonna have to know what that link means, then the person who's writing the data doesn't necessarily think about what the classifiers are doing as he is writing it. The person who is writing the classifier doesn't necessarily know that the person gonna consistently use data in this link in this particular way and you can get inconsistencies and bad stuff. If the quantificational tag is out there as an explicit label then both of them know what they are doing. Okay so, there's a bunch of these you can distinguish between kinds and instances and they can answer the "What's in a Link" question. So if I have n12368 and it says it's a modifier instance kind of telephone is modifier color black, that's a description of a black telephone. If it says it's an II name telephone and AI color black that's an assertion that telephones black. And these tags tell you which of these links are making assertions, which of these links are putting together pieces of a larger structure. Everything that you are using these links for is laid out now in this tag, so I commend that insight to you when you are thinking about RDF and what you do with it. I won't go into much detail. This particular example is subtle, so it may throw you a bit, but the MR relationship, the value restriction says "a person who's sons are professionals" - that's a restriction relationship, all sons of professionals, you have to know it got that MR relationship in order to figure out that person who sons are professional subsumes woman's children are doctors, even though sons are kind of children going what's intuitively the wrong way. In every other case of subsumption the links all go the same way, except this one, or maybe a few others like it, and you have to know that's what the quantifier is doing in order for the classifier to get this right. Okay so, the conjecture is this kind of subsumption technology can provide an efficient knowledge management for large-scale automated reasoning that would be fluent at different levels of generality, which is something that first-order logic is not and consequently all things like RDF and Semantic Web notations aren't either. It's scalable to large knowledge bases, it can automatically assimilate new information, can integrate different reasoning paradigms plausibilistic, probabilistic, deductive, abductive, and it can support this kind of rapid reactivate reasoning. My fundamental observation is the range of open problems is not something that we're gonna solve with just the careful use of a few already well understood things like first-order logic and probabilistic reasoning. We are going to have to forge some new tools to do it. I have a paper on my website called "[http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.1987.tb00211.x/abstract Don't blame the tool]" that goes into great deal of detailed why First-Order Logic isn't enough and isn't what you want and you need to create some more stuff. And finally there's a really interesting, powerful, practical application of this in a search technology that I developed at Sun, when you combine it with a specific passage retrieval algorithm, it can dramatically improve people's access to online information, but that's another talk. | |||
==Q&A part (to be completed)== | |||
''Sandro Hawke'': So I have a question. You attack First-order logic several times here and I don't understand because obviously OWL profiles specify certain extremely tractable subsets of first-order logic. So all these performance issues you mention actually do not apply to the Semantic Web. | |||
''William Woods'': Tractable subsets that are efficient don't have to rely on first-order logic for the fact that they do what they do. | |||
''Sandro Hawke'': They don't have to but they happen to be | |||
''William Woods'': So I am not, let me qualify, people have interpreted my "What's in a Link" paper as attacking Semantic Networks, too. I am not attacking either of them. My entire research career has been devoted to understanding exactly what the strengths and weaknesses of different tools are. What are they are good for, what can they not do and what are the limitations. So, I am simply pointing out some of the limitations that you have to keep in mind and some of the alternatives that you can deal with. So if you write a reasoning system that is efficient you need it to do, the fact that you can embedded it in First-order logic is kind of irrelevant. But when you look at what we really want to do, First-order logic isn't enough we need Higher-order logics of various kinds. We need intensional reasoning operations which break down if you rely too much on the law of excluded middle you can get into all kinds of problems. So when [http://en.wikipedia.org/wiki/Drew_McDermott Drew McDermott] wrote his famous paper "..." I wrote the "Don't Blame the Tool" paper. All the arguments for First-Order logic I here are usually what I call lamppost arguments. I use the analogy of the sufi tale about the woman who searching for her coin out on the street in front of her house and the neighbors were helping and finally they were asking where did you lose it? Oh back in kitchen she said. But then why are you searching here out on the street. She replied because it's dark back in the kitchen. So when you say First-order logic is great because we understand it, it's sound, it's complete it got all these good properties fine but if the problems you are trying to solve aren't there but somewhere else it's not the tool you want. I am not saying it's not one of the tools you need in your toolbox but you need some new ones. And if it hasn't been sufficient to do [http://en.wikipedia.org/wiki/Strong_AI Strong AI] it's not the tools fault. You are using the wrong tool for the wrong thing or it's not the only tool to do the job. | |||
''Sandro Hawke'': I think here is the key thing, nobody I know of is trying to do [http://en.wikipedia.org/wiki/Strong_AI Strong AI]. So [http://en.wikipedia.org/wiki/Web_Ontology_Language OWL] is perfectly fine. | |||
''William Woods'': Yes, well. That's why I wrote the "What's in a Link" paper ... but as a representation, I mean, if you use quantificational tags and graphs you've got a representationally complete space. What you do to reason on it is completely different and there are all kinds of reasoning operations. You don't reason with a First-order logic you reason with a whole suite tools. Some of which are like the Aristotelian syllogisms. I mean he had some of that right. We do a whole lot, most of our reasoning is unsound: "This side of the barn is red". Nobody is that accurate or that careful or meticulous except in science-fiction articles. | |||
=Foundations for Semantic Networks= | |||
William A. Woods<br> | William A. Woods<br> | ||
Line 25: | Line 62: | ||
Copyright © 1975 by Academic Press, Inc. | Copyright © 1975 by Academic Press, Inc. | ||
Line 63: | Line 99: | ||
: References | : References | ||
</TD></TR></TABLE> | |||
===Session Classification=== | |||
Level: Intermediate-Advanced | |||
Session Type: Research-Technology | |||
[[Category:Event]] |
Latest revision as of 22:00, 12 February 2022
Location: MIT Cambridge, MA Slides: PDF
What's in a Link, Revisited - William A. WoodsIn this Lotico session William discusses ideas about representing meaning in computer representations based on his now classical paper "What's in a Link: Foundations for Semantic Networks" - 1975 and his review "Meaning and Links " in 2007 in a historical context. William A. Woods is a Distinguished Software Engineer at ITA Software, Cambridge, MA and a member of the Semantic Web Meetup Group. http://parsecraft.com
Watch Video on: Vimeo TranscriptOk so, Marco asked me to tell you something about this paper and why I wrote it, what was going on at the time, who I was trying to address and I will cover also some of the things that have followed since. So, before I did the paper, I worked on question answering. This all got started at a seminar at Harvard where {Susumu Kuno} became my thesis supervisor, suggested: "How would you do a natural language question answering system for a database?" And I said to myself: "Well, if you're gonna do that, you ought to know something about meaning." So I went off and read a lot of philosophy and took a lot of linguistics courses and it was really subtle, people had, at the time, syntax a little bit but meaning was really mysterious. The best I could come up with from the philosophy literature is this quote from Carnap: "To know the truth condition of a sentence is to know what is asserted by it, in usual terms its 'meaning' ." So the philosophers essentially say a term has a meaning based on some ability to decide when it's true and when it's not. That's what meaning is for them and when they do it, the truth conditions are an abstract mapping from possible worlds onto truth values. Instantiations of all possible predicates on all possible objects into truth values. And I said to myself how can this infinite thing be represented finitely in a reasoning head? And the only thing I know that can do that sort of thing is an abstract procedure, something like a computer program, a Turing machine, a Post production system. Something of that sort. So I proposed a theory of semantics, which I called procedural semantics, where the meaning is defined by abstract procedures for determining reference, verifying facts, computing values, including truth values and carrying out actions. And it's build on top of, not set theory or category theory or any sort of things that have subtle complexities in their foundations, but things that we understand pretty concretely like conditional expressions and computation of a value and "if then" and "while loops" and so forth, which I maintain is a much more solid foundation than set theory or category theory or even the foundations of real variables in mathematics. And it provides a principled connection between mental symbols, that this reasoning engine is carrying in its head, and the things that are out there in the world that they actually denote or mean. In fact this is the first semantic theory in history that provides a principled, causal relationship between meanings in the head and concrete objects in the physical world. And it was practical. I applied it to a task that I was set by a guy who worked for the NASA Manned Spacecraft Center, who had collected all of the first years work the Lunar rocks into a database. And he could get his Fortran programmer to write a question to answer a question if some scientist wrote him one and he wanted to know if he could get his programmer out of that loop. And at the time I had developed a theory of semantics and a system for doing this, so I got a contract and I build a system that actually answered questions on the first years worth of the Apollo 11 moon rocks. It was demonstrated live at the Second Annual Lunar Science Conference {1971} and it answered questions that people came up to it and typed in. And it did reasonably well at it, it was a very interesting system. And the interesting thing about it is that its meaning representation language is an extension of the predicate calculus with generalized quantifiers, quantifiers that include imperative operations and actual calculations and non standard things, not only the I/O operator, but also things like a prime number of objects, or more than a certain amount of objects etc. The basic structure was for some quantifier, governing a variable in some class such that some condition is true, do an action. It just looks like ALGOL, right? For some quantified variable in a class that so the condition is true, test some other condition. And there is a TEST function and a PRINTOUT function, and with this process you can hook up to any kind of database no matter how it's structured. You have an interface from your natural language processing to this universal, procedural, operational-based meaning representation language. And then you define the primitives in this language with functions. Functions that can access the database or in extreme cases can go out in a warehouse and count bolts in a bin. So it's an extremely powerful semantic framework. Ok so, this theory permits a computer to understand in a single, uniformal way the meanings of conditions to be tested, questions to be answered and actions to be carried out. And it permits a very general purpose system for language understanding that can be used with lots of different databases and more importantly can actually cross databases and produce queries that integrate information from multiple databases in a uniform semantic framework. Without having to get those databases into some common paradigm. And as I said it can actually perceive an act on objects in the external world. And it can perceive and reason about these meaningful, meaning objects themselves, which are procedures as some kind of program inside the machines head. Ok so, there is a semantics, a theory of what things mean, but it doesn't directly address some of the things which you like to do with your meaning in the meaning representation. It's quite powerful, it's very expressive, it can compute things, but it doesn't have a bunch of associative things that you and I have in our heads. That let you go from one node to another node and pick up new things. So what I wanted was a system that was comparably well-founded, comparably expressive but also had the associativity that we humans have for following facts from one thing to another and has the capability of supporting the reasoning operations you need to do in a way that's efficient and scales. So, I looked at Semantic Networks. And a Semantic Network is essentially a network of concepts connected by links used for some kind of associative access. And again I had the same question what do these links mean and what do they have to do with semantics? And I concluded pretty quickly that they had nothing to do with semantics but I finally inured myself to calling it that nonetheless because that's what everyone called it. And the kinds of things people where doing at the time were, lots of them were psychologists and they were looking to try to mirror the things that the human mind does that are associative. And one example from Collins and Quillian, Collins is a psychologist that was at Bolt, Beranek and Newman, Ross Quillian is ostensibly the guy you coined the phrase 'Semantic Network', although arguably it was used by somebody else { Margaret Masterman at Cambridge University (1961) } in England before he did. But he looked at statements like: a bird is an animal; animal has skin; a Canary is a bird; a bird has feathers and measured how long it took people to answer questions like this: "Does a canary have feathers?" or "Does a canary have skin?." And his theory was that this is coming from inheritance in the Semantic Network. And a Canary is a bird, which is an animal, and we know that animals have skin, and birds have feathers, having feathers is a more specific kind of fact than having skin and therefore it takes a little longer to find that a Canary has skin than it takes to figure out that it has feathers because the path is longer. And they actually made psychological experiments and measured the time that correlated with these predictions. I don't think the theory was ... well psychologist have a tendency to jump to the conclusion if the data is consistent with the theory then the theory is right without asking are there alternative theories that could explain the same data. So, at any rate, I saw people all around me using notations that had links and had labels on them, had nodes that had labels on them and calling them 'Semantic Networks' and doing all kinds of things with them. And I thought they're not doing anything with meaning and you can't really use these representations to do anything unless you become more rigorous about them. At that time there was a conference that was organized down at Woods Hole and I went there and heard people like Roger Schank and a bunch of other people talking about what they were doing. And I kind of spouted off on all these problems that I saw. And Danny Bobrow took me and holed me up in the basement of Xerox Parc with a Bravo machine and got me to write it all down and that become the "What's in a Link" paper, which ended up being dedicated to Jaime Carbonell, who was my boss at the time and who unfortunately died tragically just passed away while driving down the street. Fortunately he pulled over to the side of the road before he crashed with his wife. So, the book was dedicated Jaime, and in some sense this paper was dedicated to him. So here are some of the things that I saw going all around me, people said the meaning is just all the concepts that are connected to it in this web. "That's what the meaning is!" Well there might be some truth to it but it doesn't give you any meat to start with. "Whatever the system does with its input is its semantics!" Again there may be some element of truth to that but that's not what semantics is really about. "There is no difference between syntax and semantics!" this was one of Roger Schank's claims. He illustrated it with the example: "I saw the Grand Canyon flying to the New York", which I take as proof that he is wrong. Because if he was right you would just understand what that meant, and nobody would laugh. It's the fact that syntax gives you the wrong model first that makes it fun, that makes it interesting and surprising. And then there is "Semantics is in the eye of the beholder!" this is actually pretty accurate for the Semantic Networks that people are actually using. The semantics is all in the words that you put on the labels. Ross Quillian at one point had a semantic network system implemented in LISP and he decided to remove the labels and the resulting network, when printed out, was just a giant nest of parentheses. So it's right, the semantics was all in the labels. So, in the paper I characterised two populations of people that address semantics, and I called one the linguist and the other the philosopher. Somewhat unfair, not everybody is in these centroids, but these are caricatures. The linguist is principally interested in a way to represent the different meanings of a sentence, that's fundamentally ambiguous. So they are looking for some notation in which if a sentence has two readings I can write down the two readings and say this is this one and this is that one, here is why they are different. And if a sentence has no meaning because it's nonsense of some sort they want to some criteria that lets them say that's nonsense because of this. The philosopher on the other hand is interested in the truth conditions of predicates which he postulates in his model already as unambiguous, because he has defined them. He is interested in soundness and completeness of logical deduction systems etc. These are two parts of the problem, but even put together they don't cover the whole problem. So, the linguist is interested in thing like: "I saw the man from the park" it has got two possible interpretations "I was in the park and I saw him from there" and the other is "He is from the park and I saw him in the deli or wherever", they are very fond of the asterisk which use to mark a sentence as ungrammatical. And they would say things like "The amoeba hit John with a hammer" is ungrammatical because of semantic features. The instrumental 'with' construction used here requires that its agent be +agent. And amoebas are not marked +agent because they don't have brains. So, they call that semantics but the machinery they use to do this theory is the fundamentally same kind of theory you would use if it were syntactic. So they are basically using syntactic mechanisms. It legitimately would be called semantic if you actually followed amoeba out some reference of what amoeba meant. And from your knowledge of real amoebas you can infer that they can't use hammers - except on Star Trek, you know, where maybe they're really big and they can. So, I was found the interesting things about these to be in what context, what will have to be true in that context in order for this to be a sensible utterance. And if you're actually constructing that entire model than you are doing something semantic and there would be a semantic feature. But the mechanisms they had in mind didn't go that much. OK so, the philosophers are interested in reasoning, the philosophers perspective is: a model consists of a set of assignments of truth values to every possible instantiation of every predicate over a universe of individuals. And this abstract model is like the idealised point mass on frictionless surface that physicists use. This is the thing that they created in order to proof that the reasoning systems were sound and complete. So with this as an abstract model you can prove that if I have a reasoning system, a set of rules, a set of steps for deducing things, one from another and I can show that this reasoning system assigns validity only to those things that a true in every possible model and if I can show it only assigns validity to things that are true in every possible model then I got a good reasoning systems. And that was an extremely productive thing to do, and based on that they were able to show that first order logic is something complete but problematic, and set theory got issues and actually the foundations of mathematics are not all that foundational and so a lot came of it. So what's still missing though, if we go back to the dictionary definition of semantics: is the relationship between symbols and what they denote or mean. And if you think about what the logician did with their model theory, they gave us a very good account of the semantics of IF THEN, AND, OR and NOT and the quantifiers. But they don't do anything for the semantics of snow is white. And that requires you to have something else. And the something else I proposed was procedural semantics. Snow is white, you have to know what snow is. OK, if you can build a robot and he can say this is snow, and he says it properly, and he can measure the color spectrum, and can say this is white and he can do that properly then he can test whether snow is white or not, than you actually got the semantics that can really do that. So the meaning of a proposition is a procedure for determining if it is true, the meaning of an action is the ability to do it and/or tell if it has been done, and as I said this can provide a principled sensorimotor connection to the actual physical world. Ok, back to links. So, people have used links to represent attribute and value. That's what most of you think of in an RDF network, but there are other things like links between attributes and some predicate that supposed to be true of their value. Or relations in your objects etc. So let me walk you through some of the examples from the "What's in a Link" paper. We start with something very simple:
This is the kind of data you're all accustomed to seeing and thinking about. And then I said well what happens if you don't know his height, but you know about his height: he is greater than six feet. A lot of people wouldn't hesitate to just go ahead and put "greater than six feet" into the value slot. But if you do that, you are really doing something totally silly or you've made a huge mental reinterpretation of what your entire system is doing. The things at the end of links are no longer values now. They are predicates that are true of values. And you can think of those values you had before as little identity predicates that say that identity is true of this value. So you can actually make this leap, do it consistently and have a new consistent model. But it's a little more subtle than the one you thought you had. And now it handles all the stuff it handled before plus it can handle these predicates, so you can put in constraints on values that you only know a little bit about. This is a very important thing when you're collecting data, because we all had the experience trying to fill out some form and at some point the options the form gave you didn't include the right one. I used to use a cartoon of the guy who absolutely had no hair looking at the hair color choice on the driver license application. My favorite one was a database that Xerox had of copiers, and the copier database had two addresses. One says where the copier is so the serviceman can go and fix it. And the other one says the billing address, it says where you send the bills so you get the money. Well they had a copier that was installed on a barge that went up and down on the east coast on a regular schedule, oh oh! You'd like to be able to put that into your database. You like to be able to tell that story, capture it somehow and express it as a constraint on the value. And you don't always have the real value. Ok, what if you don't know either John's or Sue's height but you know about a relationship between the two. Now the transformation I just made doesn't quite do that yet, so now you wanna say something like: height of John is greater than the height of Sue. So now, maybe what you do is you take the link from John named height and you say that thing at the end of that link is not a value, and it's not a predicate, it's a role. It represents the thing that is John's height. And now you can say: oh here is the height of John, here is the height of Sue, and now I can assert a relationship between those two that says this one is greater than that one. So again we've been able to make a backwards-compatible extension of our semantic model if you really careful about it, much more subtle than we thought even the time before, but it still works. But then what do we do with this relationship of greater that we just put in there? This no longer an attribute value, we're asserting a fact by putting a link in. We were sort of asserting facts when we put links in before, but we were thinking of this as attributes, we thinking of them now as pointers to roles so were not really saying that there's a hit role that's equal to marry on John that's the way we'll have to interpret it if we were using the model we just did on the last slide. That's not what we were intending to say were intending to assert that there is a hit relationship between John and Mary just like we asserted that there is greater relationship between the two height. And sometimes you see people, I saw people, putting both of these kinds of things in the same slide, in the same picture in the paper. With nothing to tell the reader, or certainly not an engine that was trying to look at this, you gotta treat one of them one way and the other in another way. And then there were case representations. The linguists were using this to capture the generalization that some things in English, some arguments, the verbs are characterized with propositional phrases like "from" and "to", and some of them are characterized in other languages with case inflections that just tag the word but don't have an actual operator and there is the subject and the object, so they all say the same kind of thing. They're what we call case. It's basically a slot in today's terminology, it's an attribute. So they would represent something like "John sold Mary a book" as an instance of "sell", who's agent is John, who's recipient is Mary and who's patient is a book. Well, now we've use these links not to make assertions but to build up the pieces of a complicated thing. So we've got an instance of "selling" the has three parts: one is an agent, one is an recipient, one is a book. So we can't do this with the system that we just thought about. We have to rethink it. So now you deal with everything by saying it's an instance of an assertion with arguments and there is a type of the assertion, and values the arguments. And then you find people using on the same page exactly the same structure the schema that defines the data model constraints on the use of this structure. So the agent of a "sell" is person, the recipient is a person and the object is a thing. So I simply went through a sequence of these things and pointed out what you're thinking about has to change depending on your ambition on what you want to express. And finally I came up with the case of some node has an ID has a "superc" link to telephone, has a "color" link to black. And the question is, is this the description of a black telephone or is this an assertion that telephones are black? And without some additional machinery, some bit, some extra piece of notation somewhere in there, you don't know which way this ought to be interpreted. And I pointed out that you also need to deal with this subtle phenomena that the philosophers and logicians have come up with, called intension, with an 's' not a 't'. Which is something that stands between the logical predicate or the word that you use and the actual object in the real world, that constitutes in some senses its meaning or its denotation, the thing that it means. And that the classical example is the Morning Star and Evening Star, which I think Church described as probably discovered by some Babylonian astronomer that these were actually the same physical object in the heavens, but if they meant the same thing you wouldn't have needed an astronomer, a semanticist could have told you. So the point is they mean something different, but they have the same referent. They denote the same thing in the real world, so you have to have these mental entities that have a meaning that's other than the thing that they denote. It has to do with how they carve that thing that they denote out of the world of your perception. So the Morning Star is the one you see really early in the morning and the Evening Star is the one you see really early in the evening and it's a consequence of the world not of the logic or the meaning that they happen to be the same object in a much richer theory. Okay so, the conclusions from "What's in a link" were: we need to have an ability to represent at least the kind of functional equivalent of Lunar's generalized quantification, so that we can be expressive. We need to represent propositions without committing to whether they are true or not. We need to represent descriptions of individuals without commitment to saying that they exist and we need to represent intensional entities like the morning star and the evening star. So, the challenge I have been thinking about for a long time since is, is there a way to get the best of both worlds can we create representational system that is like logical deduction and has the kind of rigor and formalness that logical deduction has, but also has the associative retrieval and intuitive character that associative networks have. In the logical deduction case the steps that you do are like rule instantiations. You're consciously applying rules and when you do it you're aware of it, you know you're doing work and you think about it. On the associative side something is happening likely subconsciously, something is following pointers from thing to thing to thing in your head. Subtle psycho-linguistic experiments can show that something is going on that takes time and sometimes it takes more and sometimes it takes less, but it's largely below our level of consciousness. We don't know how it works, but it does magic things for us. It pops out the word that fits a crossword puzzle slot. It reminds us of the thing we are trying to remember that took us two minutes to come up with and then it's suddenly there. So I've been trying to figure out ways to replicate that kind of reasoning that we do subconsciously, to supplement the kind that we do deductively, and most of what we do is exactly that subconscious instantaneous kind, and it's the kind of thing we have trouble getting computers to do. They are really good at following rules and chaining them together, better than we are. But they are not as good as we are at this other stuff. So enter the project called KL-ONE that we started at Bolt, Beranek and Newman the original goal was to try to create the infrastructure for a reasoning agent that would organize and store everything that agent had to know and enable you to find things in it had something we called structural inheritance networks that made a distinction between primitive and defined concepts, made a distinction between concepts and roles, had number restrictions and value restriction and most importantly and uniquely it had classification algorithms, most specific subsumer algorithms. By knowing enough about the meanings of our notations we could actually reason that this description is more general than that one, and with that reasoning we could organize everything we knew about generality, and we could ask of a given description what are the most specific things that are more general than that description - those are the 'MS's', and what are the most general things that are more specific than this one - those are the most general subsumes, and you could organize your entire collection of concepts with this kind of structure, and nicely, it seems, after many years working on this you can look things up in that kind of structure with the kinds of log n behavior that you can do with a binary search, and the nice thing about it is this is a partial order, this is a very complicated partial order so binary search doesn't work on it. But with this particular kind of partial order you can get something like it that has similar computational properties. So, the original goals of KL-ONE were to be this organizing structure for a reasoning system. We wanted to efficiently associate things so that you can do associative retrieval. We wanted to support that kind of subconscious reasoning that people do so well. We wanted to support high-level perception the kind of thing you do when you realize what's going in the world and how the pieces relate to each other. We wanted to automatically classify the situation we are in, in terms of all of things we knew so that the most specific things we know that apply to the current situation get found efficiently, and we wanted to have inheritance of attributes and rules all that good stuff. The hypothesis that I made, which I think is accurate is that most of our reasoning consists of what I call recognize and react. We recognize the situation we are in and immediately brought to mind are the things we know about it, what do I need to do about it? Is it good to eat? Is it dangerous? Do I need to run from it? Is there something I should do about it. Deductive reasoning happens much less often. This kind of reasoning happens constantly and even when we're doing deductive reasoning we're using this recognize and react to decide what should be the next step in our deductive chain. So it's the fundamental inter-loop of a reasoning and thinking, and so it's important to be able to recognize these known situations quickly and efficiently. It's something that we've been evolved to do well. So here are some issues. How does a reasoning system find relevant pieces of information when it enters millions of things, how does it acquire and organize all those items, how does it integrate new information into the previous knowledge, and how to does it use this knowledge structure to impose structure on the situation it finds itself and perceive itself in the world? And, again, the answer is taxonomic subsumption, organize the concepts by generality in the conceptual taxonomy, and use this structure to organize all of your rules all your facts and use subsumption algorithms to build it, and to assimilate new things into it and to find things in it when you are using it to function. So for example you would organize all the pattern parts of all your rules into a conceptual taxonomy, classify your current sub-goal into this taxonomy, find the most specific rules that apply to the situation you are in, choose a rule, do what it says, lather, rinse, repeat. So, KL-ONE was unique in the world at its time in having this ability to automatically take a concept and decide where it belongs in the taxonomy and then inherit whatever you would inherit from that point in the taxonomy. In every other semantic network system, frame system, so forth, you consciously put something at some place in the taxonomy and then it would inherit from where you put it. So, this was a fundamentally new kind of thing and these two algorithms, the MSS and the MGS work were what it takes to do it. KL-ONE took tangent at this point. Ron Brachman and Hector Levesque decided they wanted to make it declarative instead of procedural and so they decided to put it on a foundation of first-order logic and use extensional set inclusion as their model of subsumption and my take is that thereby they inadvertently signed-up to do first-order logic, complete reasoning. So the thread that took off at that point and ultimately culminated in Terminological Logics and Description Logics was, if you really constrain the expressive power of your representation you could get a certain class of inferences that it was complete and sound with, but after people all over the world - logicians love this, proved this and that sort of expressive power was intractable in this and that way, the upshot was almost everything of interest was intractable and not really feasible. I was off doing start-ups at this point and not paying a whole lot attention to what they were doing, but when I went back to look and see what had happened they had thrown away all these intuitions about how the structure suppose to support the reasoning and how you suppose to associate something to the other thing and replaced it with big expansions of complicated logical expressions that themselves were getting big. I had always assumed that the numbers that they were intractable with respect to were the size of the taxonomy but I found out they were the size of individual concepts. So, I went and thought and rethought the process and wrote a paper in {1991} called "Understanding subsumption and taxonomy: {A framework for progress}" where I revisited the original goals. I wanted an efficient principled methodology for organizing knowledge and I wanted to handle intensional subsumption, not extensional. That's where they ran into trouble. If you try to guarantee that all the empty sets are equivalent and everything that's remotely similar to that issue, that the Morning Star and Evening Star are the same than you run into trouble, but if you go for intensional rather than extensional you get to define what the intension is and you can define what the criteria for subsumption is and I defined fairly straightforward criteria for intensional subsumption. Every element for parent concept subsumes some element of the child and the necessary conditions for the child entail some set of sufficient conditions for the parent. So that says if you got one of the children and you know it is underneath the parent and it turns out you can get tractable algorithms for this it's not complete in the first-order logic sense of completeness, but that's not what we are looking. We are looking for the thing that can very quickly do a classification of a situation you're in into a set of most specific subsuming situations. It's not, that test doesn't include proving Herbrand's theorem or any of those more complicated subtle things. It's a very important, fundamental inner-loop operation that we have to do fast and that's what the goal is here. Okay so, while doing this I noticed an equivalent to the old "What's in a link" stuff, you see people writing down things like birds have wings, people need vitamins, people live in houses, the person broke a window, but when you look at these links, they have an implicit quantifier in them, everybody needs every vitamin but not every bird has every wing, not every person lives in every house, people typically live in houses, some people don't have houses. The person broke a window is a very specific statement, the others were generics. So I started to catalog and label these things that these links could represent and called them quantificational tags. So AE is the classical for every there exists. So birds have wings is for every bird there are some wings that they have. Person needs vitamins is AA every person needs every vitamin. Children like candy is typically AA but it's not always AA. So, you can, in this fashion, characterize that fact. John lives in Chicago is an instant instance, so the things at the end of the link in this case there's no real quantification is going. That's the sort of identity case and I realized you think of these as relations forming operators. So it's kind of an annotation on an underlying semantic relationship. So AE have is, as in birds have wings, can by characterized by the lambda expression, as a function of X and Y such that for every x x there is just a y and y such that x has y. So you can give a formal definition for each of these quantificational tags and you can actually use these as annotations on RDF Schema to make manifest the hidden quantificational important that somebody buried in a link name. And the reason this is important is because if you wanna write a classifier and the classifiers is going to go and decide is this concept more general than that one, you don't want to have it have to unpack the link and understand its semantics and go figure out if there is a quantifier down in there in order to use it. Okay so the quantificational tags give you a nice clean separation between the stuff that the classifier needs to know, the basic logical quantificational structure and the domain specific relationships which it can still treat as a black box. And it also forms a contract between the person who is entering the data and the way the engine gonna treat the data later. If the person just creates a link called "has children" and the classifier is gonna have to know what that link means, then the person who's writing the data doesn't necessarily think about what the classifiers are doing as he is writing it. The person who is writing the classifier doesn't necessarily know that the person gonna consistently use data in this link in this particular way and you can get inconsistencies and bad stuff. If the quantificational tag is out there as an explicit label then both of them know what they are doing. Okay so, there's a bunch of these you can distinguish between kinds and instances and they can answer the "What's in a Link" question. So if I have n12368 and it says it's a modifier instance kind of telephone is modifier color black, that's a description of a black telephone. If it says it's an II name telephone and AI color black that's an assertion that telephones black. And these tags tell you which of these links are making assertions, which of these links are putting together pieces of a larger structure. Everything that you are using these links for is laid out now in this tag, so I commend that insight to you when you are thinking about RDF and what you do with it. I won't go into much detail. This particular example is subtle, so it may throw you a bit, but the MR relationship, the value restriction says "a person who's sons are professionals" - that's a restriction relationship, all sons of professionals, you have to know it got that MR relationship in order to figure out that person who sons are professional subsumes woman's children are doctors, even though sons are kind of children going what's intuitively the wrong way. In every other case of subsumption the links all go the same way, except this one, or maybe a few others like it, and you have to know that's what the quantifier is doing in order for the classifier to get this right. Okay so, the conjecture is this kind of subsumption technology can provide an efficient knowledge management for large-scale automated reasoning that would be fluent at different levels of generality, which is something that first-order logic is not and consequently all things like RDF and Semantic Web notations aren't either. It's scalable to large knowledge bases, it can automatically assimilate new information, can integrate different reasoning paradigms plausibilistic, probabilistic, deductive, abductive, and it can support this kind of rapid reactivate reasoning. My fundamental observation is the range of open problems is not something that we're gonna solve with just the careful use of a few already well understood things like first-order logic and probabilistic reasoning. We are going to have to forge some new tools to do it. I have a paper on my website called "Don't blame the tool" that goes into great deal of detailed why First-Order Logic isn't enough and isn't what you want and you need to create some more stuff. And finally there's a really interesting, powerful, practical application of this in a search technology that I developed at Sun, when you combine it with a specific passage retrieval algorithm, it can dramatically improve people's access to online information, but that's another talk. Q&A part (to be completed)Sandro Hawke: So I have a question. You attack First-order logic several times here and I don't understand because obviously OWL profiles specify certain extremely tractable subsets of first-order logic. So all these performance issues you mention actually do not apply to the Semantic Web. William Woods: Tractable subsets that are efficient don't have to rely on first-order logic for the fact that they do what they do. Sandro Hawke: They don't have to but they happen to be William Woods: So I am not, let me qualify, people have interpreted my "What's in a Link" paper as attacking Semantic Networks, too. I am not attacking either of them. My entire research career has been devoted to understanding exactly what the strengths and weaknesses of different tools are. What are they are good for, what can they not do and what are the limitations. So, I am simply pointing out some of the limitations that you have to keep in mind and some of the alternatives that you can deal with. So if you write a reasoning system that is efficient you need it to do, the fact that you can embedded it in First-order logic is kind of irrelevant. But when you look at what we really want to do, First-order logic isn't enough we need Higher-order logics of various kinds. We need intensional reasoning operations which break down if you rely too much on the law of excluded middle you can get into all kinds of problems. So when Drew McDermott wrote his famous paper "..." I wrote the "Don't Blame the Tool" paper. All the arguments for First-Order logic I here are usually what I call lamppost arguments. I use the analogy of the sufi tale about the woman who searching for her coin out on the street in front of her house and the neighbors were helping and finally they were asking where did you lose it? Oh back in kitchen she said. But then why are you searching here out on the street. She replied because it's dark back in the kitchen. So when you say First-order logic is great because we understand it, it's sound, it's complete it got all these good properties fine but if the problems you are trying to solve aren't there but somewhere else it's not the tool you want. I am not saying it's not one of the tools you need in your toolbox but you need some new ones. And if it hasn't been sufficient to do Strong AI it's not the tools fault. You are using the wrong tool for the wrong thing or it's not the only tool to do the job. Sandro Hawke: I think here is the key thing, nobody I know of is trying to do Strong AI. So OWL is perfectly fine. William Woods: Yes, well. That's why I wrote the "What's in a Link" paper ... but as a representation, I mean, if you use quantificational tags and graphs you've got a representationally complete space. What you do to reason on it is completely different and there are all kinds of reasoning operations. You don't reason with a First-order logic you reason with a whole suite tools. Some of which are like the Aristotelian syllogisms. I mean he had some of that right. We do a whole lot, most of our reasoning is unsound: "This side of the barn is red". Nobody is that accurate or that careful or meticulous except in science-fiction articles. Foundations for Semantic NetworksWilliam A. Woods Copyright © 1975 by Academic Press, Inc.
I. IntroductionII. What is Semantics?
III. Semantics and Semantic Networks
IV. Problems in Knowledge Representation
V. Conclusion
|
Session Classification
Level: Intermediate-Advanced
Session Type: Research-Technology