>> [Inaudible] for diabetes mellitus, and the terminology, the ICD-9, or ICD version 9. We also saw the med, Medical [inaudible] Dictionary, which stores the drug codes, but let's dig a little deeper into terminologies. First, why do we care about terminology? As mentioned, if a medical concept can be given a term, coding, with a specific number perhaps assigned to it, it will allow a lot of important actions to be carried out by computers. For example, we can reuse data, we can look to see how many patients have diabetes mellitus, how many have hemoglobin A1C, above or below a certain level. We can transfer information. We can send...if we send the record of a patient here to Mount Sinai Hospital, the [inaudible] code is included, and Mount Sinai Hospital knows exactly what is wrong with that patient, whereas without the coding, it would be impossible. There are some general classes of terms. We've discussed them already. There's diagnoses. We've already discussed diabetes mellitus and kidney disease. There are diagnostic procedures, such as chest x-rays, electrocardiograms. There's lab tests. We have seen the terms numb codes for Hemoglobin A1C . Obviously, there are codes for all the other lab tests such as creatinine. We've seen that one before, and we've talked a bit about medications. They'll be...those are clearly important terms that should be included in some terminology that we can refer to. To come back to our definitions of terms, a term is a word or phrase that is associated with a specific concept or meaning, and so as we discussed in the prior section, a term cannot have two meanings, and two terms cannot mean the same thing, and it's very important that in the best terminologies, controlled terminologies, that a term represents a specific concept or meaning. A terminology is a set of these terms, and, again, it's intended to convey information unambiguously. A vocabulary are terms with a definitions, and we'll see that UMLS and SNOMED has definitions assigned to the terms, which are very helpful for a variety of actions. And lastly a controlled vocabulary, just to reiterate, all terms have unambiguous, non-redundant definitions, and, unfortunately, not all the terminologies that we use are unambiguous, but we'll discuss that in a few minutes. The terminologies that we'll briefly discuss are the ICD-9 International Classification of Diseases, and there are several versions of it, CPT, NDC. These are all [inaudible] abbreviations that have no meaning to you, but we're certainly going to drill that, and I've already mentioned the SNOMED and UMLS, and we'll talk about those in a second. Let's talk, again, about ICD-9. We discussed that it's a strict hierarchy, and the code determines its place in the hierarchy. So 240.4 is clearly a child [inaudible] 240, and as we saw in the prior example, there's diabetes and then a child diabetes might be diabetes with kidney complications. There's also a fascinating and annoying classification called NEC, not elsewhere classified, and here's a situation where there's no specific code to represent the condition of the diagnostic statement, and note is specific, but the coding system is not specific enough. I'm going to show you a great example of that in a second. And then there's also, what if the coder's reading through the chart and can't figure out if it's diabetes with or without a complication or some other disease where there's just not enough information to know how to assign a more specific code, and that is referred to as not otherwise specified, and I recognize that this may seem a little bit obtuse at this point, but just hang in there. These will...you will see these NEC's and NOS's in the ICD-9 coding hierarchy. I think a good example of not elsewhere classified is the hepatitis story, and I'd like to think about this in terms of the [inaudible] discovery. Initially there was hepatitis A and B, and each had an ICD-9 code, and there were patients with hepatitis that was neither A nor B. We literally called then non-A, non-B hepatitis. They were ICD-9 codes listed for non-A and non-B as NEC, not elsewhere classified. Then there was a code for A, there was a code for B, and if it wasn't A or B then the patient was assigned the code of not elsewhere classified. I have a table that's going to follow this in a second that will make this a lot easier to understand. Non-A and non-B was eventually identified as hepatitis C, and so now there was hepatitis A, hepatitis B, hepatitis C. They all had codes, but then there were still patients with hepatitis, without A, without B, or without C, and, once again, they were coded as not elsewhere classified. We can see this in the development of the hepatitis ICD-9 codes in these two tables. On the left we see the era of before hepatitis C was identified. Hepatitis A had a code, 70.1. Hepatitis B had a code, 70.3, and then non-A, non-B was this sort of catch all 70.5, hepatitis non-elsewhere classified. Then the following year, hepatitis C was identified, became a full-fledged code-carrying member of the ICD-9 hierarchy, and now hepatitis C was coded as 70.4, but there were still people with hepatitis who were either non-A, non-B, non-C, and so they were given, again, the NEC categorization. So my only point here is that these are dynamic terminologies that are not static, and the ICD-9's are constantly, the ICD terminologies are constantly growing and that's exemplified in the fact of this next slide where you can see the history of ICD. Starting with version 1 in 1900, there were 179 terms that were in the hierarchy. Now there are over 10,000. There are new discoveries all the time, and the terminologies must take this into account. So they are constantly being updated. Let's briefly talk about some of the other terminologies that you're going to see in your work with the electronic record. There's the CPT, current procedural terminology. This was developed and owned by the AMA. It cost money. It's required for procedural reimbursements, so if you go to a procedure, remove something, take a picture of something, in order to bill for it, you need to assign a CPT code. It's also used in evaluation of management coding. If you go and say, "I saw the patient, and I did this level of work, and I did this level of management," there are codes for that. We're not going to delve into that, but I just wanted you to be aware of that. So here are some CPT examples, and you can see there's a code for unilateral mammography. Not surprisingly, there's a code for bilateral mammography that's got to be billed at a different level. Obviously, there's a screening mammography, bilateral to view film study of each breast. So they're very specific. You know, digitization of the film is a different code, and this is the terminology that is routinely used for procedural billing. Then there are diagnosis related groups, and they're used-DRG's as we refer to them-they're used to determine how much Medicare pays to the hospital. See, it's thought that patients with similar histories, maybe they have pneumonia, they need antibiotics, they need x-rays, that they're clinically similar, and they're expected to use the same level of hospital resources. Again, this was developed to bill for reimbursement for specific kinds of clinical conditions. So you can imagine if you took a bunch of patients and you looked at their ICD-9 codes and their CPT codes, what their diseases were, and what the procedures and treatments were, that they divide up into quite predictable groups. There are 500 groups, and they're expected to use hospital resources in a similar manner. So some examples of DRG, for example, pneumonia. Seventy-five is a respiratory disease with major chest operating procedure, but no major complication or [inaudible]. Seventy-six is the same, but with minor complication and [inaudible] morbidity. And seventy-seven is similar with other respiratory system operating procedures. You can see that the level of care and presumably resources utilized is different for each of these groups, and that was the intention of the DRG. Let's go to another terminology for drugs, national drug codes, and for this example we chose the term stat, which is to lower lipids, and you can see that there are codes for an extreme level of granularity. [Inaudible] calcium has a different code than [inaudible]. [Inaudible] calcium at 20 mg tablets has a different code from the 40 mg tablets or the 10 mg tablets. This is not unexpected, of course, but would be a key. A pharmacy would need to access these codes or implement these codes, and a decision support tool might want to look at those codes. It would want to be able to distinguish between the 10 and the 20 mg tablets. There are potential drug ontology's that would be useful. These are not available, and I thought you just may be interested in hearing about 2 different approaches. Drugs can be classified in one of two different ways. Just go back to the ontology that we discussed with the lab tests. Consider the ontology, again, which shows relationships where a drug is a... where we go to the very bottom, the children on the left side. Captain Pearl is a [inaudible] enzyme inhibitor, which is in that box, and is an anti-hypertensive. So we can look at the drugs as what are all the drugs that are used for blood pressure control, anti-hypertensive, and the other one is the beta-blocker. [Inaudible] We could also look at how drugs are classified chemically or physiologically. There are [inaudible] blockers. There are two kinds of [inaudible] blockers. There's the alpha blocker and the beta blocker, and when we see the beta blocker here, these are the same drugs that are anti-hypertensive, and now classified as [inaudible] blockers, and I think this gives you some idea of the complexity of hierarchies. On one hand, it's, these are ontology's and pretty straightforward that a drug is a member of, which is a member of. Now that's pretty strict, but you see here that they're two completely different parents. One is the indication classification. The other is drug class. There are terminologies for lab tests. LOINC, logical observation identifiers, names, and codes. Now that's a mouthful, and it helps exchanging results. Again if you're at Mount Sinai, and we're transferring data over to you on our patients that they had a blood test, you'd want to have the information communicated as unambiguously as possible. The structure of the LOINC message is pretty specific. The actual coding information includes the component, what is being measured, you know, for example, urea, what are the properties of what's being measured. Is it length or mass or volume or units? What is the time interval? What's the system? Was it blood or urine that the test was drawn from? Give an idea of scale. Is it quantitative, like 4.2, or is it qualitative, you know, 3 plus, or narrative? And lastly, the method, how was the procedure, what was the procedure used to make the measurement? So here's the typical LOINC communication. We see that in the top portion glucose was measured 3 hours after 100 grams of glucose were given to the patient. This is a classic glucose tolerance test. Scroll down a little bit to the coagulation thrombin induced time. This is a clotting test. The next one is creatine kinase for, to rule out a heart attack. These are all codes, you see, before and after letters. These are always to communicate that what this specific lab test was, and see there's a lot more information here than just the code for the actual test. There are nursing talk terminologies, and there are several of them. They have different areas of focus. For example, the [inaudible] system has, is coded, has coding information for diagnosis in judgments. There's one for interventions, outcomes, family functioning, goals, patient mood. Will stabilize, for example, that's quite interesting. And, again, this represents the fact that terminologies serve different purposes. I'd like to spend 2 minutes talking about MeSH. That's another terminology, medical subject heading, m-e-s-h. It's a way to list in hierarchical fashion all medical conditions that are known, and so, for example, let's take the simple concept of pneumonia, and there is a MeSH heading, D011014 called pneumonia, and this is used by PubMed in order to help users find articles in the medical literature. So, for example, on the left side, we see 3 different articles. There is an article on pneumonia, an article on diagnosis of serious lung infection, and an article on influenza, which, as we know, causes pneumonia. Now, what's interesting is only one of those three articles actually has the word pneumonia in it, but a human will assign the code of 11014, pneumonia, to all 3 articles because the human will know that influenza discussion relates to pneumonia and diagnosis of serious lung infection, and as a result of this code, a PubMed search of pneumonia will find all three articles even though two of them did not mention pneumonia. So that's just another terminology, very powerful. It drills down as well; it gets more granular. Pneumonia has children such as bacterial pneumonia, you know, 18410, and bacterial pneumonia also has its children, pneumoococcal pneumonia, legionnaire's disease. They don't always make sense because [inaudible] pneumonia is not actually bacterial pneumonia, nor is mycoplasma strictly speaking, but it's close enough, but these terms in this terminology are extremely useful in categorizing articles in MedLib. Now there are two terminologies that are, they're a little sloppier, that they're poly-hierarchy and certainly non-ontology. We're going to discuss that in just a second, but it's what I would refer to as terminology plus. You get a little more than just the words. You might get information that includes a definition. That's helpful. Remember, you saw the MeSH terms that I just discussed when discussing the PubMed search. There's no definition. It doesn't say what pneumonia is. So it would be nice to have a terminology with definitions, and there would also be very helpful to know what the term, what category the term was. Is it, is it what we call semantic-type? What is it related to? Is it a disease or a syndrome or is it a sign or symptom? You see, they are very different, a sign, a symptom, or a disease and a syndrome. Or semantic type could be a drug. It cold be a procedure So these terminologies when we speak about them, they can have the semantic types which, again, are assigned by humans, just to make that clear and offer the user of these terminologies much more than just here's the name of the term. There's also, and I'll show an example in a second, relationships between terms, and the terminology, like such as SNOMED or UMLS, can provide some information about the relationship. For example, there could be a relationship between a drug and a disease. A drug treats disease. Actually, you know, now that we're thinking of it for this example, there can also be another relationship, which is, you know, diseases caused by drug, certainly the circumstances where a disease is actually caused by a drug. So let's talk a little bit about SNOMED and the UMLS. SNOMED stands for Systematized Nomenclature of Medicine, SNOMED. In clinical terms, there's a version of CT, which is used primarily, and we'll discuss that. It's systematically organized, computer-processable collection of medical terms, and it covers all the things you'd think were important-diseases and findings, procedures, microorganisms, bacteria, viruses, etc., pharmaceuticals, and as I've alluded to and actually discussed a couple of times, we have these terminologies so we can index, store, and retrieve and aggregate clinical data across specialties and science of care. By using these terminologies, we'll be able to communicate better with each other, and we can use it to organize the medical record. [Background noise] That's really the whole point in this whole discussion today. SNOMED, and if you have a browser open, you might just go search SNOMED. It's very powerful and it's actually quite a lot of fun, and, but the first thing to note about SNOMED is that it's not a strict hierarchy. That is, a strict hierarchy would be something like this where there are no other arrows that cholera and meningitis are infectious diseases, and tuberculosis is a lung disease. Well, we know that's not true. We know that tuberculosis is both an infectious disease and a lung disease, and we see here that a characteristic of SNOMED that is not an ontology. It is a poly-hierarchy, which has great value because we could ask the computer, "Well what is tuberculosis?" Well, it would point to a lung disease and an infectious disease. That would be very helpful. Or we could say, "What are all the...just, you know, what are three common infectious diseases?" Well, be cholera, meningitis, and tuberculosis. Or in another query, we say, "Well, what are some lung diseases," and tuberculosis would show up. The poly-hierarchy has great utility, and SNOMED is interesting in that it just like the drug ontology I mentioned before where a drug could be categorized either by its indications or its class, we, many of the conditions in SNOMED can be characterized in different hierarchies. For example diabetes mellitus could be categorized as a disorder. Of course, it is a disorder, but it also could be categorized as a clinical finding if you look at the two arrows there. We see the same concept as mapped at two different hierarchies, clinical findings and disorder. If we...Now I'm using the website for SNOMED here. We can look at the different hierarchies and see where diabetes points up to. First let's look at the hierarchy for disorder, and we see here, quite interestingly, is that, again, that's a poly-hierarchy. Diabetes mellitus is a disorder of glucose metabolism. Yes. And it is a disorder of the endocrine pancreas. That is also true, and one could get to the benefit here of saying to the computer, "Find all the patients with the disorders of glucose metabolism." It would find diabetes in a completely separate query for unrelated to glucose metabolism. Say I want to find all patients, I'm taking out the pancreas, what disorder is my result? Diabetes mellitus would be also found in that search, and there's another way to organize the hierarchy in addition to "is diabetes mellitus is the disorder of glucose metabolism?" It could be has finding site. You know, where is the site that this disease emanates from or comes from, and diabetes clearly has finding site in the pancreas. So, again, this, you know, the example of taking out the pancreas, what diseases am I going to possibly encounter as a result of that diabetes mellitus we found? So SNOMED has poly-hierarchy and is really useful because of the ability to formulate queries that answer different questions and find parents and children of particular conditions in different hierarchies, depending upon what the query is. Now let's go on to the UMLS, which just a little history here. Donald Lindberg in 1993 started developing the UMLS with his colleagues at the National Library of Medicine, and his little blurb wrote, "The purpose is to improve the ability of computer programs to understand the biomedical meaning and user inquiries and to use this understanding to retrieve and integrate relevant machine readable information for users." So what's both fabulous and nerve-racking about the UMLS, it's basically all the controlled vocabularies that I've just discussed put into one big terminology and relationships established, and it can provide a mapping structure among the vocabularies and allows translation. I showed one example of that when I, or I will be showing an example that, later on when we discuss how you could use the UMLS to convert from one terminology to another. It's also a great thesaurus and ontology of biomedical concepts, and, you know, you can find synonyms. I think you'll need to see some examples. So let's go to the UMLS website and type in, we'll use the metathesaurus, upper left, second to left, next to home. It's an engine, and it's going to search the UMLS, and I'll show you what the output is in just a second. Let's type in diabetes mellitus, hit OK, and we get an actually very long lengthy output. I've just cut and pasted a couple of key pieces here. We see that when we typed in diabetes mellitus, we got a cui, a concept unique identifier. That is, for every concept that is unique, there is a number, a code, a cui, we call them, or cui as some people say. So a diabetes mellitus, the cui is 11849. We also find a definition here in the UMLS. A heterogeneous group of disorders characterized by hyperglycemia and glucose intolerance. And we also see that there's a semantic type assigned to diabetes. What is the type? It's a disease or syndrome. Now we saw before that in SNOMED, it was also a finding. There are various examples where there's overlap or it's not actually a contradiction. There are other terms in the UMLS for diabetes, which will call it a finding, but for this output here, we see that, oh, and I don't know if you can see below definitions, we see that the definition is from MeSH The UMLS is really one big giant semantic network where things are related to each other, and we have all these semantic types. I mentioned disease or syndrome, diabetes mellitus, and what the key is that there are relationships between the semantic types. So, for example, there could be a simple relationship in this lower, on this portion on this slide [inaudible] that a disease or syndrome, such as diabetes mellitus, is a pathologic function of the pancreas, of course, and pathologic function is a semantic type. This disease...you go down to the last one, pathologic function. Glucose intolerance occurs in the disease of diabetes, and these relationships can be exploited to obtain additional information from a record provided by the terminology. So, the point here is that there are relationships that humans have coded between various semantic types. There are approximately, in the UMLS, 135 semantic types. I've mentioned a couple of them such as anatomic abnormality and antibiotic and general abnormality, disease or syndrome finding, and there are many relationships. Here, there are some. There are 54 relationships. Effects is obviously a key one. Associated with... diabetes is associated with kidney disease. Diabetes complicates heart disease. Diabetes and kidney disease commonly occur. These relationships are very useful, and this is a map of I don't know how easy it is to see, but you can see all these relationships and the way they're related in the semantic network. You know, for example, we see injury or poisoning can relate to physiological function or it can relate to anatomical structure. And, again, these can be exploited for purposes in understanding the knowledge represented in the electronic health record. This is another example of the hierarchy for the semantic types and is also a hierarchy for the relations of the semantic types. I think it's easier to understand this. The very top is biological function, and then there's physiological function and pathological function, and as I mentioned, they should be obvious. There's probably hierarchy, and, you know, there's a limit as to how specific, there's a lot of crossover here. Just to sum up, there are, I mentioned, most of the terminologies. You can see that they have their strengths and weaknesses. The ICD-9 at the top, first it costs money, and it's good for diagnosis. It's good for other tests, diagnostics. It's good for therapy, but it's not great for the other things we mentioned, the physicals and the past medical history, the history [inaudible] . You got to look at the key down below. DRG's, diagnosis related groups, are great for other tests, but they're not good for much else, but the drug terminology is great for drugs, and, of course, non-useful for anything else. [Inaudible] is great for labs and some diagnostic tests, but not much more. SNOMED and MeSH and UMLS are much broader, but they have their own problems as well. They're not strict hierarchies. They're, this poly-hierarchy, it's at times difficult to resolve ambiguity, but these are the main terminologies that we use, and we're going to discuss a little more later on about how we can use UMLS to our advantage. This ends Part 2, and we will go on to Part 3, Knowledge and Representation of Unstructured Data.