>> We will proceed to part 4, which is the representation of higher levels of knowledge. We've been discussing how simple data -- patient's diseases, patient's medicines, patient's lab values are represented. And what I'd like to do is spend a few minutes on more complex kinds of information, and some of the strategies that are being designed to represent that information. Before doing so, let's just briefly look at an overview of what I would call, for want of a better term, levels of understanding. At the simplest level, there's data. It could be data 1.3, we don't know what that means. 1.3 could be a lab test, it could be the length of time the patient was in clinic, and there's virtually no utility to that piece of data. So additional information needs to be connect to that data. And we could say for example that the data 1.6 is the creatinine level, creatinine being the test of kidney function. And there is utility to creatinine is 1.6, or a creatinine value in a table of 1.6. That gives us some indication of kidney function. At a higher level, beyond information we could -- perhaps we could call it knowledge, we could say that if the creatinine was greater than a particular value, then the patient had kidney failure. And here the utility is assessing the function and we could say that the machine had knowledge about some -- certainly a level of complexity above just the informational level that creatinine equals 1.6. This is obviously expressed as a rule. And in fact if we combine the information and -- that the creatinine is 1.6, and the knowledge if creatinine is 1.3 then kidney failure, we actually know a great deal about the status of the patient -- about their medical information. And lastly, again for want of a better term, we could say that the machine would have wisdom if there were a rule saying if kidney failure, then give a particular medicine. The utility of course is that this would be a decision support tool, and we could use this tool to make sure that patients were treated in a manner such that they achieved the recommended guidelines. And so we have different levels of knowledge, and the most complex, at least to date I -- I think are the ones that include guidelines. With -- the rules are far more complicated if kidney failure than medicine, what's the definition of kidney failure, granting to greater than 1.3, and what is the creatinine? That is a measure of kidney function. So with this in mind, let's look a little bit at guidelines which -- and ask how is that a -- clinical practice guidelines are just -- as you probably already know, are systematically developed statements to assist practitioners and patient decisions about appropriate healthcare for specific clinical circumstances. This is a definition of the Institute of Medicine. And generally, guidelines are developed by the bodies that represent, you know, associations that represent the sort of cutting edge of thought of a particular discipline. There's the Kidney Foundation, or the -- the National Kidney Foundation, there's the American Society of [inaudible] of Kidney bodies, and they will come up with guidelines. Now the trick is encoding the guideline so that it can be used by a computer to help guide the physician in determining whether or not the patient has reached that guideline. Now remember our diabetes decision support tool. There was a guideline in there, which was if the hemoglobin A1C was above a particular value, then the conclusion by the computer was the patient had not achieved the guideline. How are these guidelines represented? They can be complex. We have a little general understanding it from our prior examples, and, you know, one -- one such way of representing a guideline is the formal ordinate syntax, which is a language for writing and sharing task-specific health knowledge. There are different modules. You know, there might be a module for a abnormal kidney function, there might be a module for adequate diabetes care. And the module contains very specific elements. It's got to define the context, you know, what's gonna evoke the response. In the case that we discussed, where the decision supporter was the hemoglobin A1C level being above a certain value for the creatinine, and the kidney function was the creatinine above a certain value. There needs to be some logic, some general rules. We've seen those already. We need a recommendation for appropriate action. And we need to know where this data comes from. Where are the entities mapped to in the clinical repository? Is it a table of the patient's drugs? So here's a typical example, in this case of a guideline to -- you can see the title at the top, second line, title alert on low hematocrit. That's the blood count, and the guideline would be to maintain the patient's blood count above a certain level. In this ordinate syntax it -- it's a formal way of writing the rules so everybody can understand them clearly. The purpose is actually included [inaudible] provider of new or worsening anemia. And we see that the knowledge that is necessary, it's gonna be data-driven, it's gonna look at a piece of data. And the data that we're gonna be looking at is the blood count, not surprisingly, since this is an alert on a low blood count. We're gonna need to know the current blood count hematocrit, we're gonna need to know the previous one -- previous hematocrit. And what is gonna prompt the routine is the measurement -- actual measurement of the blood count. Somebody has a blood measurement, it's gonna prompt the set of rules to be invoked. And the logic is simple. If the hematocrit is -- is less than the previous value by 5, or less than 30 irrespective of the previous value, then conclude true, which means that the patient's anemia has worsened, and there could be an axim [phonetic], in this case right that the patient's hematocrit is low or falling rapidly. And so a very simple set of rules, and the point of this is that -- that there's a standard way of writing these rules. I was a little bit sloppy in the rules that I wrote in my CPOE -- or my decision support tool for diabetes. But this is a more formalized way, and you could see that institutions could easily adopt these set of rules, and program their logic of their analysis in their computer system to make sure that the same format was applied to all patients. I'd like to end by saying there are challenges. There are things that we can't represent right now. We can't represent graphics. You can't say -- just the way you can't Google, find every picture where a patient -- where a person has a small nose. There's no way -- the -- the picture's not represented that way, there's no machine that can go in and look at the pixels, it doesn't know how to look at the nose. It could look at the captions, where the caption might say ah, the patient has a small nose. And so to that extent, for our current analysis storage and retrieval of important information in graphics, we look for the report. The EK -- ECG, tracing could have a report that said the patient has a long QT syndrome. That is of course not the same as the machine actually observing in the graphics along QT syndrome. So there's a lot of work to be done in graphics, and right now the best that we can do in graphics is to represent what is being said in the reports. There are also sophisticated tests, where we don't have a number. We have text the describes the outcome, for example someone could have an exercise stress test, and we could extract terms, again like the graphics example above, developed chest pain after 3 minutes. But we -- that's the best that we could do. And again, there are many sophisticated tests which there's no way to represent the outcome without -- except by looking at the report. Time stamps are often another hurdle. The time stamps are often accurate for labs, but in conditions -- for conditions and notes historically inaccurate. That might say the past medical history of diabetes is that 25 years ago instead of last week, and time obviously is gonna be something that we'll want to know about. We'd like to know if the patient has had diabetes for 27 years or -- or 2 weeks. So these are just some of the challenges that we face. ^M00:10:39 [ Silence ] ^M00:10:45 >> So we're finished with our talk on knowledge representation. We've covered several areas, we've discussed representation of the medical data, you know, labs and diseases. We've discussed terminologies. Some are extremely narrow refocus, like the drug terminologies, some are much broader but poly [phonetic] -- have polyhierarchy [phonetic] like the [inaudible]. We talked a little bit how you represent -- we can represent unstructured data, notes, and the doctor's notes are free text, a natural language. We can make sense out of it, but we need natural language processing. And lastly, we discussed representation of higher levels of knowledge, such as guidelines, which are important in achieving the best outcomes for the patient, and -- but need to be computable, and need to be consistent. Our represent -- learning objectives today I think -- hopefully you will have achieved, and be able to define knowledge of representation. You'll be able to describe the different data of patient record that need to be represented. You should be able to describe how information is stored and used by computer programs to benefit healthcare. I think you have a couple of examples of that now. You can define terminology, and provide examples of the most commonly used ones. Talk a little bit about the extra benefit of UMLS [phonetic] and [inaudible], and hopefully discuss the role of NLP and extracting important information. I'd like to thank many people, but -- but these colleagues here were instrumental in getting -- sharing their slides with me. Many of the slides you saw today came from their slide decks, and I'm greatly appreciative of that. Thank you very much, and if there are any questions please don't hesitate to email me at Herbert.chase@dbmi.columbia.edu [phonetic]. Thank you.