Chapter 10. Ontologies

Practically speaking, an ontology is just a fancy word for a dictionary. More specifically, an ontology is a way of structuring knowledge, by coding complex concepts into simpler terms. Beyond that, ontologies vary widely in the level of complexity given to structuring the relationships between the terms. Some people use the term ontology in healthcare to refer to only systems that are capable of deeply modeling clinical information. Others use the term to refer generally to all types of abstract health care data sets. Sometimes, people speak about ontologies in terms of codes, code sets, or coding processes. This chapter will cover both important sources of clinical coding systems or ontologies, of several types.

The basic problem that clinical ontologies seek to address is the difficulty that automated processes have with synonyms. Heart attack, cardiac arrest, and myocardial infarction, as well as the acronyms MI or AMI, can all be used to describe the same event. Having multiple terms for the same thing is difficult if you want to fully automate any clinical information process. An ontology solves this problem by noting that the terms heart attack, myocardial infarction, MI, and AMI really are the same thing, and cardiac arrest, like cardiac arrhythmia are related terms, but not synonyms for those concepts.

Traditionally the academic study of ontologies has been of interest to philosophers, computer scientists, and cognitive scientists, who are deeply concerned with the mechanisms by which humans encode knowledge. We will mostly be ignoring the high-brow, but interesting, philosophical issues with ontology unless they specifically impact some aspect of the practical use of ontologies in healthcare. If you are already familiar with the concepts of ontologies you might be somewhat offended by the way typical medical ontologies ignore simple obvious principles that the science of ontology provides. Most medical ontologies are either irredeemably poor as knowledge representation schemes, or so consistently abused in practice that they might as well be. Moreover, no chapter on medical ontologies could begin without the admission that the coverage of ontologies is complex enough to merit a book on its own, and that the licensing of medical ontologies is so convoluted and inconsistent that this chapter certainly should not be taken as anything close to legal advice.

A Throw-Away Ontology

An easy way to get the basic concepts behind an ontology is to make up a silly, throw-away ontology, so that we can quickly understand the concepts involved. Of course, we will have to make up some utterly false health “facts” to go along with our discussion. We begin with the premise that foot size and type is critical to overall health. Once we accept this assertion, it becomes obvious that we need a way to clearly talk about foot size and type without ambiguity.

Almost all ontologies begin with elements in the form of a definition. Instead of only defining words, like dictionaries, we will define codes and phrases. For our example, we create series of codes to determine what big feet really look like, and our foot size ontology begins like this:

Big Feet

Over size 15

Little Feet

Under size 6

Normal Feet

Between 6 and 15

Our ontology starts out informal, just a working consensus among us foot-health scientists. For this reason, we have limited ourselves to defining phrases. Now, at least informally, we all know we mean when someone says “Big Feet” in a foot-health diagnosis. But then someone points out that there is are clinical issues related to having a shoe size over 20, and that the European portion of our community uses a different standard, based on European foot sizes, that has somewhat different definitions of foot size, using European measurements. But the European foot-health ontology also defines the terms “Big Feet” and “Little Feet.” How do we know when we are using the European ontology, versus the one that we are developing?

This problem is what computer scientists call a namespace collision, two things with different meanings, but the same name, occupying the roughly the same knowledge domain. If unchecked, this would lead to tremendous confusion, as one phrase might have two clinical meanings depending on who wrote it. This is a semantics error, and when people say “semantic interoperability” they mean transferable health data without this and other semantic errors. To fix this, we need to always specific which ontology we are using when we say “Big Feet.”

We also recognize that we will need to make ongoing changes to the ontology as the science of foot health progresses. It is possible that we might need to redefine what the term “Big Feet” means when we discuss them today, versus what it means in the future.

To solve these problems we name our silly ontology and begin versioning it.

Fred’s Fake Foot (FFF) ontology version 2.0
Really Big Feet

Over size 20

Big Feet

Over size 15

Little Feet

Under size 6

Normal Feet

Between 6 and 15

There is, of course, a competing ontology with the European alternative.

Silly Shoe Science (SSS) ontology version 1.0
Enormous Feet

Size 40 and above

Big Feet

Size 30 and above

Standard Feet

Between 10 and 30

Little Feet

Under 10

Hairy European Feet

Hairy feet that Europeans often shave, causing blisters

As the fake foot science progresses, it becomes clear that there will need to be codes for the finger toe (that toe right beside your big toe). As it turns out, in our made-up world, having a finger toe longer than your big toe is clinically relevant. This makes some sense, because having a longer second toe is pretty freaky, even in the real world. So we extend our ontology to include two more “codes.”

Freaky Second Toe

A second toe that sticks out farther than the big toe

Normal Second Toe

The second toe is smaller than the big toe

Next, we decide that we need to create codes and groups of codes so that we leverage computers to group patients with “strange feet.”

Fred’s Fake Foot (FFF) ontology version 3.0
9999 – Strange Feet Codes
9999A – Really Big Feet

Over size 20

9999B – Freaky Second Toe

A second toe that sticks out farther than the big toe.

1111 – Normal Feet Codes
1111A – Big Feet

Over size 15

1111B – Little Feet

Under size 6

1111C – Normal Feet

Between 6 and 15

1111X – Normal Second Toe

Second toe is smaller than the big one

Using the added codes and code families, we can quickly perform correlations between other health conditions and “strange feet.” So far this ontology is a simple tree structure. More often than not, medical ontologies are too complex to encode as trees. As they mature they almost always turn into a web. In this ontology for instance, it becomes obvious that we need a more formal definition for “feet.” We will call these codes “core” codes and so we will prepend them with a C.

C0001 – Foot:

The part of your body that touches the ground when you stand.

C0002 – Feet:

The plural of foot.

C0003 – Toe:

Things that stick out the end of a foot (C0002)

Now our ontology is a graph, illustrated in Figure 10-1.

HL7 standards
Figure 10-1. HL7 standards

Of course, because your authors have invested upwards of 10 minutes coming up with these ontologies, we want to protect the value of our labor and ensure that we have funds available to continue the important efforts of fake foot science. Therefore we publish the current versions of the ontologies with a license that ensures that individuals must pay a small fee to exercise the ontology copyright to communicate.

Learning from Our Example

At this point, we can now use our ridiculous ontology to begin discussing the real-world issues with ontologies.

First, ontologies are implicitly namespaces. The same terms can, and often do, have slightly different clinical meanings in different ontologies. Do not presume that two terms in English (or other languages either) are being used the same way in two different ontologies.

When healthcare information encoded in one ontology needs to be used with healthcare information encoded in another ontology, we have an ontology mapping. This mapping process is rarely perfect and should never be fully trusted. There are many ontologies, and efforts to create overarching maps between them are meta-ontologies. These meta-ontologies are necessarily clinically “lossy” because subtle distinctions between ontologies are often lost in large ontology mapping efforts.

We can also see, from our example, that health information coded with ontologies is often time-stamped. Foot size changes over time, so a person’s foot code at age 6 is not likely to be his or her foot size later on.

Many of the useful ontologies have strict licensing agreements; they are owned by the organizations that develop them. Often, these organizations require payment for the use of the ontology. This creates a profit motive regarding the use of ontologies and as we have seen elsewhere in healthcare, strange incentives have strange results. If possible, proprietary ontologies should be avoided. Sadly, it is rarely possible to avoid them.

Versioning matters in ontologies. Data encoded with a code set in one year might have a different meaning when interpreted using the same ontology in a later version. Sometimes versions are backward compatible, but major revisions of ontologies should probably be considered a completely new ontology, in the sense that mapping is required in order to make the transition from one version of the codes to another.

Ontologies have sometimes significant geographic variations for legitimate reasons. Often different parts of the world have different encoding needs. Almost all internationally used ontologies have national subversions designed to address issues only found in given regions. Copyright and trademark law vary drastically between nations, and the rules for ontologies change with them. Most of this chapter will discuss things presuming that the context is U.S. copyright and patent law, which is usually the worst case scenario for practical health IT purposes.

Any ontology is based on some kind of scientific process. This example is, of course, entirely made up, but real medical science is often just as arbitrary. It is very difficult, for instance, to change the vocabulary of medical science for anatomy. Some body parts are named in Latin, many are based on the names of famous doctors who first conducted surgery on an area, and almost none of the scientific names for anatomy correspond exactly to natural language. The elbow and knee, for instance, are great summary terms for those joints, but those comfortable terms are rarely exact enough for clinical decision purposes. Remember just because your healthcare provider is talking with you about your “knee” pain does not mean that is how he or she is thinking about it. Your provider might discuss your “knee” pain but really be thinking “patellar tendonitis.”

Good healthcare ontologies let clinicians make inferences on the clinical data sets encoded in them. For instance if we had 100 patients encoded with the FFF code 9999A Really Big Feet, we could infer that all of those patients also had 1111A Big Feet. This is a silly ontology that lets us make silly inferences, but useful and profound ontologies can help us make useful and profound inferences. The quality of inferences created by an ontology is the measure of the depth and clinical relevance of that ontology. Some people do not call a given data set an ontology unless it is capable of helping to make some kind of inferences.

Because medical language is fuzzy, and constantly changing, ontologies are fuzzy, too. Most ontologies are designed by committees of experts, and any information systems engineer will agree, design by committee rarely leads to the most effective designs. This fuzziness, and the fundamentally arbitrary process of assigning otherwise random numbers to clinical knowledge, leads us to one conclusion: ontologies are mostly arbitrary. The degree to which they are not arbitrary is the degree to which the ontology as a whole correctly models the state of medical science and the degree to which the ontology is leveraged in communication. It hardly matters if the model is better, without adoption. Like the metric versus English measurement systems in the United States, being better does not guarantee adoption.

Lastly, it is important to note that at any given point in time, even the best clinical ontology is going to be a snapshot of medical science at that given time. Medical science suffers both from mere wrongness and from societal bias. Countless procedures that are now known to be clinically useless or harmful could be easily encoded in the ontologies available when the procedure was popular. Bias in the healthcare community presents an even more troubling problem.

The Diagnostic and Statistical Manual of Mental Disorders (DSM) is the primary ontology of mental illness used in the United States. This mental health ontology was started by the American Psychiatric Association (APA) in 1952. In that edition of the DSM, homosexuality is listed as a mental illness. In 1957 Evelyn Hooker released the first extensive study that showed evidence that homosexuals were no less functional than heterosexuals. Over the next 15 years, more and more evidence would be released that homosexuality did not appear to impair the ability of a person to function in society. In 1973, under considerable protest, the APA removed homosexuality as a disease from the DSM ontology. It would take until 1986 to have it removed entirely.

Our example ontology is ridiculous, which in itself serves a purpose. A clinical ontology can be no better than the state of medical science that it reflects. Silly science in, silly ontology out. Remember that a clinical ontology is not the truth, just a reflection of scientific consensus. For the record, the second toe being longer than the big toe is called “Morton’s toe” and it can actually have clinical relevance in some cases. Usually, it is just a normal variation and means nothing.

CPT Codes, Sermo, and CMS

Sermo is generally regarded as “Facebook for doctors.” Although it is not clear that “Facebook for doctors” is even a useful idea, it is certainly true that Sermo is an extremely popular community and probably the largest doctor-only social network.

In May 2007, Sermo and the American Medical Association (AMA), the largest and most powerful medical association in the United States, announced that Sermo would become the official social media site for the AMA. As you can imagine, this was pretty big news.

In 2009 that relationship imploded.

A major part of the fallout was a blog post from the CEO of Sermo, Daniel Palestrant, regarding Current Procedural Terminology (CPT) codes. CPT stands for and is an ontology for medical procedures that the AMA developed. If you have read Chapter 3, you know that CPT procedure codes are the central ontology for doctors and hospitals to get paid in the United States.

The Sermo CEO wrote, in part:

The current CPT coding system represents a collusion of convenience between the business side of the AMA and the insurance companies … at the expense of physicians and patients. Perhaps most galling, thousands of physicians work on the CPT codes, for which they receive no compensation, while the AMA generates millions of dollars in revenue. Clearly this presents a massive conflict of interest as the AMA is supposed to be advocating for physicians, yet it receives the majority of its revenues from the very same insurance companies that the rest of the physicians increasingly find themselves facing off against in the deepening healthcare debate.

Even more interesting than Dr. Palestrant’s comment are the hundreds of Sermo posts from doctors commenting on the issue of CPT codes.

In 1983 the predecessor to CMS, the government agency that runs Medicare, dictated that providers who submitted bills to Medicare would need to code the procedures in CPT, and would need to code the diagnosis codes in ICD. This decision made CPT the de facto ontology standard for medical billing. Under the AMA’s view, any healthcare provider can license the right to use CPT codes by purchasing that right from the AMA. You can visit the AMA website and purchase the right to use CPT codes for about $300 per year for one user. Obviously, bulk purchases can be cheaper. Under the AMA view, every provider who encodes information using CPT codes owes them money. As the Sermo CEO points out, the revenue that they get from licensing CPT codes outpaces income from membership dues, especially because membership in the AMA has been dwindling for years.

It might seem somewhat unfair that the government enforces the use of a proprietary billing ontology. Forcing all providers in the United States to use CPT codes is something like forcing all doctors to submit codes in Microsoft Word document format. Why should a private organization be the arbitrary beneficiary of a government policy decision without any opportunity for competition and alternatives?

You would not be the first person to feel that way. In 1997 in a landmark case, Practice Management Information Corp v. The American Medical Association, Practice Management Information (PMI) sought to show that because the federal government enforced the use of the CPT ontology, they should not be required to pay for the license to use the CPT copyrights. PMI lost the case. The judges ruled that because the government had chosen to give AMA a monopoly, rather than the AMA working to create one, there was no justification for invalidating the AMA copyrights.

It is not clear, to begin with, that ontologies can be the subject of copyright. Random numbering systems, at least, are not copyrightable. The famous court case Feist v. Rural regarding the copyrightablity of the numbers in a phone book established that copyright of mere facts, or mere associations between random numbers and simple terms, were not copyrightable. Before Feist v. Rural, a rule called “the sweat of brow” doctrine applied to U.S. copyright: if it took effort to generate a work, it was copyrightable.

Even before Feist v. Rural, mere information was not copyrightable. A mere fact, like “the sky is usually blue,” cannot be subject to the protection of copyright. However, before Feist v. Rural, collections of facts were copyrightable. In Feist v. Rural, the court found that a phone book was essentially a collection of arbitrary facts, and therefore cannot be copyrighted. After Feist v. Rural, the new standard was creativity. Mere facts, or even collections of facts, cannot be copyrighted. Creative connections, ordering, and categorizing facts in useful ways can be copyrighted.

Medical procedures are simple facts. The way that the AMA encodes them is creative and therefore copyrightable. This is similar to the way a food recipe copyright works. A recipe is a set of instructions that is not copyrightable, but the words, sentences, and layout of recipes can be copyrighted. If you exactly copy a recipe from a cookbook and republish it, you might be violating copyright. But if you replicate the process described in a recipe in your own words, those words are not subject to the original author’s copyright. Obviously just because the AMA owns an ontology for common medical procedures does not mean that they can lay claim to ownership of the procedures or alternative descriptions of them.

A procedure ontology is part mere fact (the procedures themselves) as well as creative work (the way in which the facts are encoded). This is an important distinction, because it deeply influences how ontology mapping can occur.

The other fact to highlight at this stage is that lawyers do not agree on what is or should be copyrightable. The old joke applies: Give two lawyers a legal question, and you will get at least three legal opinions. The copyright ability of ontologies is far from a closed question, as lawyers disagree about the issue and as knowledge science evolves, the courts will create further opinions. Perhaps most important, this discussion only applies strongly to an ontology that is primarily used in the United States. International health IT efforts are usually free to choose among inexpensive ontologies that require little or no licensing fees.

Returning to PMI v. AMA, the court ruled in favor of the AMA, in part because a medical ontology essentially has three components. The first component is a random code, which, by itself, is not copyrightable. The second is a short term like “Big Foot” that is close enough to a fact that it might not be copyrightable. The association between the short term and the numeric code, like the contents of a phone book, would not be enough to create a copyrightable work. However, if you include the longer text description of an ontology, then the combination of the three of them would probably be copyrightable. Moreover, the useful grouping of codes into blocks of relevantly connected codes is probably copyrightable. To review:

9999B – not copyrightable
Freaky Second Toe – not copyrightable
9999B – Freaky Second Toe – probably not copyrightable
9999B – Freaky Second Toe: A second toe that sticks out farther than the big toe – copyrightable.
9999A, 9999B, 9999C, 9999D … 9999(x) all having to do with foot issues – copyrightable.

Should the AMA be able to own and license copyright to the CPT? Should CMS enforce the use of a proprietary ontology for medical billing? Is the Sermo CEO right that the AMA is protecting its CPT monopoly instead of advocating in the interests of the medical profession? Is this outline of the copyrightability of ontologies correct or is it missing subtle copyright issues? All of these questions are both fascinating and irrelevant to our current discussion of health IT.

What is relevant is that the AMA views the CPT ontology as an “intellectual property” (we use that term ironically) asset that it ferociously protects. A health IT vendor (PMI) went up against them in court and lost. As Dr. Palestrant notes, recent legislation supports and extends the CPT monopoly for claims transactions. Specifically, CMS is authorized under HIPAA to dictate what coding standards are used for medical billing. This regulatory power does not merely extend to claims submitted to Medicare. Bills to private insurance must also use the ontologies that CMS chooses. If healthcare providers want to use CPT codes in health IT software, and they must in order to legally bill third parties for healthcare services, they must license them from the AMA to avoid a legal fight.

Generally, health ontologies face difficulties with regard to licensing. Especially when more than one ontology is involved, the licensing implications can be very difficult to parse out. The Ontology Metadata Vocabulary (OMV) is an effort to make licensing and other “metadata” regarding ontologies clearer by providing a standardized ontology metadata format.

CPT licensing is not a trivial cost. A large hospital system might pay the AMA hundreds of thousands of dollars a year to license CPT codes. Small providers regularly ignore the AMAs licensing requirements (much like running pirated copies of software) and hope to slide under the radar of the AMA’s enforcement. This can backfire drastically, and be an expensive mistake for a small practice. Unfortunately, this means that you, as the deployer of health IT systems, might be the first person to point out that the practice owes thousands of dollars a year in licensing fees to the AMA. Hopefully this section might provide you with the ammunition you need to convince a small healthcare practice to “go legit” regarding AMA licensing fees.

However, it is important to understand that although the AMA might believe that they have strong copyright protection over the CPT medical procedure ontology, they cannot have copyright to the mere facts of medical procedures. Moreover, the CPT terminology system is so focused on medical billing it is widely regarded as clinically impoverished.

The most significant impoverishment of CPT relates to its interaction with ICD codes. Before we further discuss the limitations of CPT codes for clinical purposes, we must discuss ICD codes.

International Classification of Diseases (ICD)

How do people die? In 1891, the International Statistics Institute wanted to formally answer that question and retire the numerous individual efforts that had pervaded this statistical inquiry for the preceding 200 years. Thus began the oldest formal medical ontology development process that is still in common use today.

ICD is a disease ontology. Disease here is used broadly to mean anything that has clinical implications that is the result of illness, injury, or is merely different about an individual. That is not the normal definition of disease, so most people refer to ICD codes as diagnosis codes, which is more accurate.

Today, the World Health Organization (WHO) maintains the ICD ontology. Older versions of the ontology are available in the public domain, and the current version can be used freely without costs.

The vanilla ICD database is not actually used in the United States, rather its cousin, the ICD-CM ontologies. The CM stands for “Clinical Modification,” but it probably should be U.S. because the clinical modifications are actually intended to support disease concepts for U.S. medical billing. The ICD CM ontology is maintained by the Centers for Disease Control (CDC).

Claims data is composed of CPT procedure codes justified by ICD diagnosis codes. Together the combination of the two ontologies used in medical claims transactions forms a “billing ontology” that only applies to the United States.

E-patient-Dave-gate

As we mentioned in Chapter 6, the “e” in e-patient does not primarily mean “electronic patient,” but “engaged patient” or “empowered patient.” E-patients is a social movement that advocates for doctors and patients to abandon parental notions of healthcare. According to e-patients, all patients should take a more proactive role in their own healthcare and doctors and nurses should encourage this new empowered role.

One of the most vocal and famous members of the e-patient community is Dave deBronkart, who is better known as e-patient Dave. Dave used Internet research and collaboration with other patients to find a life-saving treatment for his metastasized kidney cancer. That amazing experience is not actually what made Dave famous.

In April 2009 Dave blogged about his experiences when he automatically imported his Beth Israel Deaconess records into Google Health, a personal health record. On April 13, 2009, the Boston Globe put Dave’s story on the front page. Dave rocketed to international fame overnight.

Here is, in part, what Dave wrote about the contents of his Google Health record:

The really fun stuff, though, is that some of the conditions transmitted are things I’ve never had: aortic aneurysm and mets to the brain or spine.

So what the heck??

I’ve been discussing this with the docs in the back room here, and they quickly figured out what was going on before I confirmed it: the system transmitted insurance billing codes to Google Health, not doctors’ diagnoses. And as those in the know are well aware, in our system today, insurance billing codes bear no resemblance to reality.

(I don’t want to get into the whole thing right now, but basically if a doc needs to bill insurance for something and the list of billing codes doesn’t happen to include exactly what your condition is, they cram it into something else so the stupid system will accept it.) (And, btw, everyone in the business is apparently accustomed to the system being stupid, so it’s no surprise that nobody can tell whether things are making any sense: nobody counts on the data to be meaningful in the first place.)

E-patient Dave had stumbled on the first lesson of healthcare ontologies. CPT plus ICD claims data is mostly useless for clinical purposes.

This episode, which is nicknamed “e-patient Dave-gate” by health IT industry insiders, represented the first time that mainstream media recognized that claims data, given back to patients, creates lots of confusion. In the end, Beth Israel decided to stop sending any claims data to Google Health.

E-patient Dave was educated and intelligent enough to recognize that the “diagnosis” codes were not actually conditions that he had. Instead this occurred because of one of the dangerous interactions between ICD and CPT codes. Most of the reason for the “CM” part of the ICD code in the United States is that ICD is used to justify procedures. But many procedures are done as part of the diagnostic process. The simplest example is gall bladder removal. Often, patients have stomach pain that might be caused by the gall bladder, which can have disease states that are difficult to detect using modern scanning techniques. Each year many patients have their gall bladder removed without any evidence that the gall bladder is the source of the stomach pain. A surgeon never opens a patient up, looks at the gall bladder and says “it looks fine,” and then sews the patient back up. If gall bladder surgery is initiated, the gall bladder is usually removed, even if it “looks fine.”

The removal of a healthy gall bladder is often just a part of a diagnostic process. But when that procedure is billed to the insurance company, the diagnosis code that is given is probably ICD-CM 575.6, which stands for “unspecific disorder of the gall bladder.” There is no code for “might be an unspecific disorder of the gall bladder.” Many patients have the code 575.6 on their healthcare records that served to notify a health insurance provider of the reason for a gall bladder surgery, but in fact, the patient never had gall bladder disease at all, a fact that was only determined as the result of the surgery in question. The surgery must be paid for, hence the existence of the nebulous 575.6 ICD code, with a clinically unreliable meaning.

This is what had happened to e-patient Dave. He had gained access to his claims history, which is not the same thing as his healthcare record.

It is possible to use claims data to make determinations about clinical issues. Often, industry professionals can look at claims data for a single patient and infer what the healthcare record might look like. If they saw gall bladder removal, but then further treatment for stomach pain, they might infer correctly “gall bladder was never the problem.” Further, data aggregation techniques can be used to mine large amounts of claims data for useful information. Insurance companies and government payers hire claims data analysis experts to mine terabytes of claims data for patterns. This process is a well-established health IT subindustry. It is surprisingly simple to detect and correctly infer clinically relevant information from such analysis, but no true health IT expert would ever presume deep clinical meaning in claims data alone.

CPT, combined with ICD-CM, was simply not meant for creating clinical records of care. Together, they form a “billing ontology” that is only relevant and useful in the contorted third-party payer system in the United States.

Generally, CPT codes are regarded as so billing-oriented as to be a clinically invalid procedure recording system. When it is used to justify CPT codes, ICD-CM is regarded with similar disdain. However, ICD alone, when used without intention to bill, could be a clinically valid diagnosis ontology.

When evaluating patient data coded in any ontology it is critical to understand both what the codes were made for and what they are used for.

Crosswalks and ICD Versions

The HIPAA mandated version of CPT is CPT-4. The current container version for electronic health claims is X12 4010A. The current version of ICD in the United States is ICD-9-CM.

Things change.

CMS makes determinations regarding which versions of both file formats and ontologies that healthcare claims in the United States use under HIPAA.

According to the current schedule, the container for electronic claims will change from X12 4010A to X12 5010 on January 1, 2012. More important, in October 2013, the current CMS schedule requires a change from ICD-9-CM to ICD-10-CM. Both of these changes will be extremely traumatic for the healthcare industry. Historically, CMS has frequently made last-minute extensions of deadlines like this, to accommodate the difficulty that many healthcare providers have shifting standards.

All translations like this create cottage industries of technologists who provide software or services to translate from the previously working standard to the current standard. Most health IT vendors will provide patches and mechanisms to enable their software systems to support coding in these standards, and many vendors already fully support both standards.

There are two major processes, besides simply supporting the underlying standards, that health IT professionals can look forward to.

The first is just the practical problems associated with teaching clinicians to use a diagnostic ontology that has close to 20,000 codes to one that has closer to 200,000 codes. Physicians have no training in medical school or in residency in the business of medicine, or science of coding (the kinds of things discussed in this chapter). For the most part, clinicians are using substantial mental shortcuts in the current billing process. Many times, a clinician checks a box on a paper form indicating what procedure was performed, and which diagnosis is justified, and very often, they check the same 20 or 30 options, day in and day out. Now, if that form is simply upgraded to cover the same code coverage, using ICD-10, that form would have to be the size of a poster. Obviously, that is not going to work and clinicians are going to need to learn to code diagnosis almost from scratch.

Until both the software and the clinicians understand ICD-10, there will be the continued generation of ICD-9 coded data. That ICD-9 data will need to be auto-converted into ICD-10 coded data in a partially automatic process. The basis of that process is something called a crosswalk or map. A crosswalk is a set of links between the same information in one ontology to another ontology. UMLS, which is discussed later, is essentially a merger of several crosswalks. For instance, both ICD and SNOMED CT are ontologies that can describe diagnoses accurately. It is possible to use an automated process, using a crosswalk as a map, to covert data coded in ICD to data coded in SNOMED. In this case, ICD-9 and ICD-10 might as well be totally different ontologies.

Thankfully, CMS makes this process much easier by providing very specific instructions regarding specific mappings in the form of general equivalence mapping (GEM) files available from the CMS ICD-10 website.

A substantial portion of the time, one ICD-9 code maps to one ICD-10 code. However, sometimes, one ICD-9 code could map to one of several ICD-10 codes (there is a reason there are more codes). In this case, determining which ICD-10 code is appropriate might require a quick review of a particular patient, or perhaps a working understanding of standard operating procedure for a particular facility.

Occasionally, several ICD-9 codes collapse primarily to one ICD-10 code. Most of the time, this occurs because a single concept has been completely remapped in ICD-10, and requires one core code, with additional codes to recapture meaning that is bundled in a single code in ICD-9.

The structure of the GEM files can be used to automatically detect which codes will need manual intervention.

Converting to ICD-10 will dramatically improve the capacity of CMS and other payers to understand what clinical conditions are being treated in particular patients. This will improve the richness and clinical reliability of claims data to a certain extent, which is good news for payers. However, patients and clinicians might not see much benefit from this change, especially immediately.

The transition from ICD-9 to ICD-10 will be painful as long as ICD-9 codes are being regularly converted to ICD-10 as normal part of the billing workflow. If possible, your organization should spend as little time as possible up-converting to ICD-10 codes or down-converting to ICD-9 codes. Unfortunately, the time when both ontologies will be used will be greatly extended by the fact that payers will also need to upgrade software to support ICD-10. It is possible that clinicians will have to create systems to support semiautomatic translation between the two ontologies for months or even years.

As a health IT specialist, you should become familiar with the resources and manuals associated with the CMS GEM files well in advance of your organization’s migration requirements.

Other Claims Codes

CPT codes are entirely under the control of the AMA. CMS has been developing its own additional code set that is used alongside CPT codes to communicate other claim data. This code set is called the Healthcare Common Procedural Coding System (HCPCS). There are three levels of HCPCS, but the first level is actually identical to CPT-4. The second level is what are normally thought of as HCPCS, and the third level has been retired. For the most part, the levels of HCPCS serve only to confuse; when someone says HCPCS, they normally mean codes that are not included in CPT-4.

CMS, under power granted by HIPAA, mandates ICD and CPT/HCPCS for most medical claims transactions. It also mandates Code on Dental Procedures and Nomenclature (CDT) for dental procedures, and the National Drug Code (NDC) for drug descriptors.

CDT is largely equivalent to CPT codes, except instead of being owned, maintained, and licensed by the AMA, it is shepherded by the American Dental Association.

The NDC is maintained by the FDA and is a simple coded list of medications.

There are several other minor code sets that are required for very specialized medical claims transactions. These are documented on the CMS website.

Drug Databases

NDC is probably the simplest drug database in existence. By itself, any mere list of drugs is pretty useless clinically. To be useful, a drug database should include as many of the following relationships between drugs as possible:

  • Drug-drug interactions

  • Drug-disease interactions

  • Drug-food interactions

  • Drug-treatment interactions

Of these, the drug-drug interactions are generally the largest interactions data set, and often people simple refer to the drug-drug interactions portion of a drug database to mean a drug interaction with anything. If it is possible that a drug could interact with anything in a patient’s clinical environment, the drug database should include that information somewhere. For common interaction patterns, the drug database should enable automated querying to check against interactions.

Modern drug databases also frequently encode what pharmacists call ADMET, which stands for:

Absorption/Administration

How the medication gets in

Distribution

Where it ends up in the body

Metabolism

How it gets processed, and how fast

Excretion/Elimination

How it leaves

Toxicity

How much can kill or hurt people

The source data for drug databases worldwide is typically the U.S. government Department of Veterans Affairs’ National Drug File (NDF). The VA hires many pharmacists to maintain a drug database. Those pharmacists add new drugs to the drug file, remove outdated and unavailable drugs, and maintain drug-ingredient interactions data. This data is merged with information on drugs that comes from the U.S. Pharmacopeia, which includes information regarding both drug and food ingredients, and lists from the FDA and the National Library of Medicine (NLM). All drugs approved for use in the United States engage in a back-and-forth process between the USP, FDA, and NLM, but the VA database contains drug data in a way that is designed to be used in a clinical environment.

Maintaining a drug database is thankless, difficult work that requires tremendous patience and attention to detail. Any given medication, illegal substance, or food will typically include several component substances. Not all drug-drug interactions are bad, and often new drugs are actually the combination of two previously separate drugs that are known to work well together. As a result, a drug database must concern itself with the fundamental ingredients of pills, injections, patches, sprays, and so on.

When any two drugs are prescribed together, a drug database must analyze all of the subcomponents of the drugs to determine if there is an interaction, and this process could potentially consider tens of combinations. When a patient has 15 or 20 medications, which is more common than you might think, drug interaction checking turns into a very complicated process. More advanced drug database integrations will also consider other components of a patient’s health information, including conditions, treatments, and diet to ensure that other types of interactions are not occurring.

Not all interactions are created equal. Some interactions serve to make drugs more potent, and can sometimes be intentionally prescribed. Some interactions are weak, and in different clinical situations can be tolerated. Even major potential interactions might be acceptable in extreme circumstances.

Drug formats must be constantly updated. Often drug manufacturers will produce several doses of a drug, for instance, a 5 mg, 10 mg, 50 mg, and 200 mg pill. The next year, a manufacturer might retire the 50 mg pill and introduce a 20 mg pill.

The VA drug database is often biased toward the VA formulary. A formulary is a list of drugs that is available for purchase from a given health insurance provider or payer. The VA is typically its own payer, and as a result has its own list of which medications it typically offers. The VA, as a government agency, is focused on providing the information services that are valuable to the veterans in its care. However, as a government agency it is required to release software and data that it develops under the FOIA. Usually, for information that is requested by FOIA frequently, the VA simply posts the data on its website for download.

Several drug database providers download the drug database from the VA and maintain proprietary drug databases. To differing degrees, the drug database providers either use the VA data as a cross-check to their own independent drug research, or rely on the VA data as the core of their drug database product. The VA drug database is regarded as a correct but “messy” source of drug and drug interaction data. Most drug database providers ensure that the VA data is updated with information relevant to other formularies, and is otherwise clean and usable data, and then republish the data. Others use the VA database as a check against their own internal drug research process. Still others use the core ingredient data to make translations to other languages, where drugs are marketed under different names and potentially in different standard amounts.

The leading provider of proprietary drug data is First DataBank, which is owned by the Hearst Corporation. First DataBank sells a drug database to clinicians, usually through EHR vendors, and also engages in all kinds of drug-related data gathering and drug data analysis. It sells data to pharmacies, insurance companies, EHR vendors, and so on. It is deeply involved in the process for setting the prices for different drugs. There have been several lawsuits accusing First DataBank and the drug company McKesson of cooperating to artificially inflate the cost of drugs in the United States. Like much of healthcare in the United States, lack of transparent pricing for medications makes establishing a fair market very difficult.

Although First DataBank is by far the largest provider of drug data, there are several other drug database providers including Micromedex, MediSpan, Gold Standard Alchemy, and Multum. Each of these companies provides its own proprietary drug ontology, using a different coding scheme.

Thankfully, there is a meta-ontology that serves to map the different coding schemes of various drug database providers called RxNorm. RxNorm is maintained by the NLM at the National Institutes of Health (NIH). Well-designed EHR systems can support multiple drug database source data, and use RxNorm to reconcile differences between them.

RxNorm is the first component of the Unified Medical Language System (UMLS), which is a broad meta-ontology that unifies many different ontologies that we will discuss later.

Drug data, after claims data, is by far the most liquid data transferred in the United States. Most pharmacies in the United States are wired to accept prescriptions electronically. This is largely the result of the Surescripts network. Surescripts is the result of a 2008 merger of the only two large e-prescribing providers in the United States, RxHub and the original, smaller Surescripts. As a result of this merger, Surescripts has near-monopoly status on the routing of electronic prescriptions in the United States. Meaningful use requires e-prescribing. Technically, this does not mandate integration with Surescripts, but practically Surescripts is required. Surescripts is the only way to ensure that electronic prescriptions actually reach pharmacies. Eventually, the Nationwide Health Information Network (NWHIN) will provide an alternate pathway between clinicians and pharmacies. As pharmacies realize that the use of the new network would essentially be free, compared with Surescripts, which charges substantial fees, NWHIN adoption for e-prescribing could quickly become very popular. With companies like NewCrop (described later) in a position to make a profit advantage of such a second e-prescribing pathway, the competitive landscape could shift substantially in coming years.

Electronic prescribing requires the merger of at least three different healthcare ontologies and databases. The first is a drug database, keyed to RxNorm. The second and third are healthcare provider identification schemes, the National Provider Identifier (NPI) mandated by HIPAA for medical claims transactions, and Drug Enforcement Agency (DEA) number that is given to doctors, dentists, veterinarians, and anyone else with the privilege to prescribe controlled substances. Historically there has been considerably paranoia regarding the electronic prescription of controlled substances. The DEA and HHS together maintain the controlled substances list, and knowing the shorthand for controlled substances is important for electronic prescribing:

  • Schedule 1: Drugs with no medical use that are very addictive, like heroin.

  • Schedule 2: Drugs with limited medical use that are very addictive, like morphine.

  • Schedule 3: Drugs with a little more medical use, and a little less addictive than Schedule 2, like steroids.

  • Schedule 4: Drugs with a little more medical use, and a little less addictive than Schedule 3, like phenobarbitol.

  • Schedule 5: Drugs with a little more medical use, and a little less addictive than Schedule 4, like codeine in cough suppressant.

The drug schedule does not make sense, and likely never will. For instance, the street drug heroin is Schedule 1, which makes sense, but cocaine, another dangerous street drug, is Schedule 2. Marijuana is Schedule 1, but is currently quasi-legal to prescribe by state law in California and some other states. The important lesson here: all drugs are somewhere on the schedule (except alcohol and tobacco, which are regulated separately by the ATF), the schedule determines how prescriptions can be written, and you should never, ever guess what schedule a medication is. This is information that you should get from your drug database.

Until recently, prescribing “serious” low-schedule medications could not be done using electronic prescribing. This rule, which has stunted the uptake of e-prescribing by forcing doctors to have both a paper prescribing and e-prescribing process in parallel, has now been relaxed assuming certain digital signatures are in place. Practically speaking, these digital signatures and the biometrics processes that are typically involved with them will become a requirement, either through meaningful use, or through practical necessity.

The most significant rule regarding drug databases is that they should never be trusted for tasks that they are not doing, and they should constantly be updated. New drug interactions, even involving older drugs, are discovered all the time. Using an outdated drug interaction file can expose your organization to serious liability. Further, if you purchase a drug database that has drug-drug interactions, but does not include drug-food interactions, then it is critical that clinical staff understand that the health information system is incapable of catching food-drug interactions.

There are several other drug-related databases that are worth mentioning: The WHO maintains an ontology of adverse drug reactions that describes what can go wrong with different medications.

Dailymed is a database of the contents of medication inserts (the several pages of really small text that you never read on the inside of your prescriptions) that comprehensively describes a given medication.

HL7 publishes a standard for labeling products called Structured Product Labeling (SPL) that the FDA uses to publish the labels for drugs.

Electronic prescribing is a powerful tool, but it is often underutilized. The Surescripts network is capable of feeding your EHR data regarding what medications a patient has been prescribed elsewhere. It is critical that this data be properly imported into your EHR to properly check for drug interactions. This should occur at the pharmacy also, but often does not.

In many cases, however, e-prescribing will represent new burdens on doctors. E-prescribing requires very specific details regarding dose, dose form, drug form, and countless other details. These details are defined by the National Council for Prescription Drug Programs (NCPCP). With e-prescribing, for instance, a clinician will have to specify whether a drug should be a capsule or a tablet, and whether generic equivalents are acceptable. With paper prescriptions many of these details were determined by the pharmacists and patients. E-prescribing is an entirely new skill set.

One of the important players in the e-prescribing space is a low-profile company called NewCrop. NewCrop provides an integration layer to e-prescribing and automatically faxes prescriptions that cannot be delivered using the Surescripts network. NewCrop provides at least two different integration methods for EHR systems. The first is an API for deep integration. This allows an EHR to appear to be doing e-prescribing, but allows NewCrop to do the heavy lifting. The other, simpler integration allows web-based EHR systems to hand off users to a NewCrop web interface, so that NewCrop actually provides the user interface to e-prescribing. NewCrop is essentially a drug ontology merging shop, providing unified drug ontology licensing services as well as substantial services mapping plain text drug data (from a doctor’s typed note for instance), onto proper drug ontologies. Given the number of standards and protocols involved in e-prescribing, this is regarded as a very valuable service. Most small EHR vendors use NewCrop to ease the complex integration with Surescripts. When searching the Surescripts website for certified vendors, NewCrop clients show up with “Uses NewCrop” under their product name.

MirrorMed, an open source EHR feeder project for ClearHealth, contains a reference open source implementation of the NewCrop interface as an add-on module to ClearHealth.

SNOMED to the Rescue

If there was a “winner” in the clinical ontology space it would be SNOMED CT, which stands for Systematized Nomenclature of Medicine – Clinical Terms.

If there was a clinical specialty that could be regarded as responsible for “classifying” medicine, it would be pathologists. Pathologists are doctors who specialize in running laboratory tests on human tissue and fluids. Pathologists can often spend much of their career looking through a microscope or using other lab equipment. Because they often provide diagnoses that are unreachable in any other way, pathologists have the nickname of “the doctor’s doctors.”

In the United States, one of the respected professional organizations for pathologists is the College of American Pathologists (CAP). Since the late 1960s, CAP had been working on a clinically accurate ontology that could be broadly applied to all aspects of medicine. In 1997, the trademarks and copyright for SNOMED were transferred to a new international body called the International Health Terminology Standards Development Organization (IHTSDO). In 1999 SNOMED was merged with similar efforts from the United Kingdom Clinical Terms project. The result is SNOMED CT, which is generally regarded as the most complete clinical ontology in any language.

Moreover, SNOMED CT is liberally licensed to any IHTSDO “member country.” Because the main ontology was created in English, this list includes the United States, Canada, the United Kingdom, and Australia. Several other European countries participate in IHTSDO, but adoption is far from universal, focusing on North American and European countries. Recently, IHTSDO has decided to liberally license SNOMED CT to very poor countries free of charge. If a nation is defined as poor by the World Bank, they can use SNOMED CT as if they were a member country. International readers, especially in non-English-speaking middle-income or wealthy countries should verify the status of their nation in the IHTSDO member list. If you are not in an IHTSDO member country or in a country treated as a member country you might need to pay licensing fees to use SNOMED CT.

Because so many people have free access to SNOMED CT, it has rapidly become the de facto standard for encoding health information in a clinically valid way, and is regarded as the “right” way to populate popular health information exchange formats such as CCR/CCD/CDA (acronyms explained in Chapter 11).

SNOMED Example

Because SNOMED is so popular, an example of the coding scheme is warranted. Suppose a patient dropped boiling water on his or her left foot that resulted in second-degree burns. In SNOMED, that could be encoded as:

Concept ID = 62537000
Fully Specified Name = Second-degree burn of foot

This code has certain relationships with other codes, like its parents, which include:

37696000 Burn of lower leg
125604000 Injury of foot

In turn, this code has a parent code:

84677008 Burn of the lower limb

It has an “associated morphology” of

46541008 Second-degree burn injury

and a “finding site” of

60496002 Skin structure of the foot

For any given instance of a foot burn, you might also have codes that further specify when it happened, how often it happens, what type of person provided the information, and so on.

Most important, it should be noted that if the core concept for “Second-degree burn of the foot” were missing from SNOMED, it could be largely re-created by grouping other codes together. This allows SNOMED to describe things accurately that it does not directly encode.

SNOMED uses a complex method of coding. Each number (formally called a SNOMED CT identifier or SCTID) has a structure that can be used by a computer system to make determinations about which particular part of SNOMED the SCTID comes from. It also includes a check digit to protect against data entry mistakes. A computer can use an algorithm to determine if an SCTID that was typed in is valid. For the most part, typing SCTIDs should be avoided in any case. From a user’s standpoint, they should almost always be seeing the descriptions instead. Similarly, searching via SCTIDs is only something done during data analysis, so casual searches will almost always focus on the descriptions. SCTIDs are random numbers at this stage, and do not reflect the structure of SNOMED CT, although this might change someday.

Relationships are also rarely used extensively by front-end users (at least not overtly). Instead when relationships are good for doing analysis on patient records, and driving intuitive user interfaces for clinical users. So your end users will never directly care that a “surgery” is a “procedure,” or that 37696000 is a parent of 62537000, but you as a data engineer certainly might. These relationships allow SNOMED CT to not merely code medical knowledge, but participate in the development of new clinical information via inferences.

Dr. Hendler regularly provides the following example in presentations:

Pneumococcus “Is A” type of Gram Positive Coccus. Gram Positive Cocci “Is A” type of bacteria. Streptococcus “Is A” different kind of Gram Positive Coccus.

Strep throat “Is A” disease. It has findingSite pharynx, it has morphology inflammation, and it has causativeAgent Streptococcus.

Pneumococcal Pneumonia “Is A” different disease. It has findingSite Lung Structure, it has morphology inflammation and it has causativeAgent Streptococcus.

For the sake of argument (this is made up), let’s say that you come to suspect that anyone with any Gram Positive Coccal infection who was treated with a certain class of antibiotic later develops kidney disease. You would need to find all the patients who had Gram Positive Coccal infections. Using lexical searching or ICD9 you could not do this in a systematic way. You would just have to know every possible name for a condition, and every possible organ system that might be involved.

You would certainly miss some relevant conditions. SNOMED CT allows you to do a “subsumption” search. You can ask the reasoner (software that queries ontologies) to find for all SNOMED CT codes in the set:

All diseases with causative agent Gram Positive Cocci.

It will not only find the two conditions listed above but many more, some of which a typical physician has probably never heard of. Then, after the reasoner returns a comprehensive list of codes that match your search criteria, you can take that list and use it to query your EHR. This provides a list of patients who have symptoms or diseases that could be caused by Gram Positive Cocci. You further restrict that list to see who has received the antibiotic in question, and now you have a data set which addresses your hypothesis regarding kidney disease.

This example demonstrates how an ontology such as SNOMED CT, which encodes a much deeper level of medical knowledge than billing ontologies, can provide clinicians with information and context that is simply impossible without a robust ontology.

SNOMED CT has more than 300,000 terms in its ontology. It has millions of relationships between those terms. In comparison, ICD-10 numbers about 150,000 concepts, up from around 17,000. Practically speaking this means that you cannot “buy the SNOMED book” in the way that you can with ICD and CPT codes. We will discuss in the next section some of the implications of dealing with an ontology of that size from a technical standpoint.

From a practical standpoint, SNOMED’s size is both one of its greatest strengths, as it is by far the most comprehensive clinical ontology available, but also its weaknesses, as mistakes in the ontology are difficult to find and fix.

SNOMED CT can be downloaded with UMLS from the NLM.

SNOMED and the Semantic Web

This section contains a technical discussion of how queries work on data encoded using SNOMED. If that subject does not interest you, we recommend you skip this section and come back to it when you need it.

Computers are very very good at repetitive processes. Modern desktop computers can perform millions of simple calculations in the time it takes to blink. Those simple calculations can add up to more complex calculations, which gives us the broad range of things that computers can do. Still, we all have experienced our computers slowing down when performing complex tasks, especially on large data sets.

The science of how hard it is for a computer to calculate something, based on the size of the data set required, is called computational complexity. Computational complexity is far beyond the scope of this book, except to outline the basics of the concept. Basically computational complexity varies greatly depending on the size of the data set in question. If a computational process is very efficient, then there is little increase in the amount of computing required when a data set gets larger.

This is like the famous chessboard and wheat problem. If you place a single grain of wheat on the first square, and then double that, and place 2 on the next, and continue with 4, 8, 16, 32, 64, and so on, you end up with 18,446,744,073,709,551,615 grains of wheat in total, or a pile of wheat the size of Mount Everest. If you view each additional square as a new data point, and the grains of wheat as the computational cycles required to process that data point, you begin to see the potential problem with computational complexity.

SNOMED is a very large ontology. When a large clinical data set, say 10,000 patient records, is encoded using SNOMED, the computational task to query those patient records based on SNOMED relationships might easily have become a wheat-grain task.

However, SNOMED CT is conformant to OWL 2 EL. OWL stands for Web Ontology Language and is the standard of choice for semantic web. OWL 2 EL is subset of the OWL language that intentionally restricts some semantic relationships (negation and disjunction specifically) that make querying extremely large data sets something that computers can do more easily.

SNOMED CT has had growing pains over the years in terms of its “semantic goodness,” but its compliance with OWL 2 EL means that many of the modern tools that apply to OWL can be leveraged to perform data analysis on SNOMED encoded clinical data. Moreover, full, usable OWL compliance might not be too far off, given advancing processing speeds and improved OWL tools along with improvements to the SNOMED ontology. A new OWL reasoner, called the condor reasoner, has been especially productive in processing SNOMED CT. We can recommend the book Programming the Semantic Web, by Toby Segaran, Colin Evans, and Jamie Taylor (O’Reilly), to help you determine what you can make of SNOMED’s OWL compliance.

To actually get SNOMED files into OWL formats, you can use a Python script from the clinical ontology modules of the python-dlp project, or Perl scripts written by Kent Spackman, included in the UMLS distribution of SNOMED CT under the “Other Resources/Developer Toolkit” directory. Both of these tools will provide an OWL output of SNOMED that can be used for the basis of other queries. You might also consider using some of the ontology management tools mentioned later in the chapter.

UMLS: The Universal Mapping Metaontology

This chapter has provided a tour of important ontologies in medicine. We have considered several drug ontologies, which are mapped together using RxNorm. We have considered CPT and ICD codes, which form the heart of a patient’s claim history. And we have considered SNOMED, which is a comprehensive and available clinical ontology.

But we know that there is considerable overlap in all of these ontologies. There is also overlap between different versions of these ontologies, as we see in the transition between ICD-9 and ICD-10.

There is a project whose sole existence is to map the meanings between different clinical ontologies worldwide. That ontology, which is really a meta-ontology, is called the UMLS. If Frodo were carrying an ontology, it would be UMLS. It is absolutely the one ring to bind them. The heart of UMLS is its metathesaurus. The metathesaurus is the end result of the use of other UMLS components, which largely exists to create this metathesaurus.

The metathesaurus includes semantic concepts that are mapped onto other ontology systems. There are 2 million concepts in UMLS, but they map onto around 200 ontologies. Many of these ontologies are in languages other than English and some are in several languages. Although almost 70% of the descriptions are in English, the translation of medical terminology is an important concept embedded in UMLS.

Many of the ontologies that are “wrapped up” in UMLS contribute only a few hundred or thousand concepts. But massive ontologies like SNOMED, ICD, and CPT ensure that there are many many concepts in the UMLS database. RxNorm, for instance, is merely the drug database component of the larger UMLS meta-ontology.

The licensing for UMLS can be very complicated, because it involves sublicensing the various subcomponents. It is possible only to use a subset of UMLS that includes ontologies that you want to license. However, in some cases, licensing an expensive ontology through UMLS can have unexpected financial benefits.

Many organizations use the capacity of UMLS to map SNOMED to CPT codes to avoid paying expensive licensing fees to the AMA. A large organization will have its clinicians code procedures using SNOMED CT, which costs nothing for providers to use in the United States. Hundreds of clinical users can code in SNOMED without needing to pay any fees to the AMA. Then, using UMLS, medical billing specialists translate the procedures into CPT codes just for billing. Assuming only a few billing specialists perform this task, the licensing codes for CPT are minimal at best.

It is important that any organization considering this strategy do so carefully. All medical claims must be fully justified by the contents of the clinical record, and using the shift between SNOMED to CPT to illegally upcode can be an expensive mistake. Alternatively, downcoding for safety can also result in lost income.

More than just a potentially direct financial benefit, UMLS holds the key to true semantic interoperability. It is only when the underlying meaning in each ontology can be mapped to the meaning in another ontology, even in another language, that we can hope for worldwide semantic interoperability of healthcare records.

Extending Ontologies

As a rule, ontologies all have an associated standards body, through whom additions and extensions to a given ontology are managed. Often, a clinical organization that is using an ontology to drive clinical workflows will feel the need to add a code to an ontology. This should generally be avoided because adding codes with meanings that only your clinic or hospital understands makes fully semantic interoperability much more difficult. No other organization will be able to parse clinical data encoded in this way.

There are two general reasons this happens.

First, it is possible, especially in research-oriented clinical environments, to find that a new code is clinically justified. That is to say, existing codes or combinations of codes in a given ontology might fail to capture some significant clinical meaning that should be captured in a formal way, in relation to the ontology. This is a legitimate reason to add a code, but most ontologies have a process by which an organization can create a temporary code and have that code submitted to become part of the larger ontology through some kind of approval process. This case surfaces a legitimate reason to extend ontologies, but follow-through is important.

Care should be taken to ensure, when an ontology is extended for clinical purposes, formal coordination with the standards body that manages the given ontology. In many cases, your request for additional codes will be denied, but rarely will this happen without the standards body taking your clinical use case into consideration and addressing it in some other way. If the standards body does ultimately deny your request for a new code, and provide an alternative means of coding your clinical use case, the codings using the old code should be updated to include the new coding schema. This is the only way to ensure that new clinical meaning generated at your specific site does not ultimately contribute to the breakdown of semantic interoperability.

The other reason people extend medical ontologies happens when part of the workflow requires a trigger that is different than the triggers already provided by the available codes. It is common for an automated workflow to have rules like “Whenever a patient coded with this CPT code goes to an X-ray, do ABC.” However, a well-designed EHR should have mechanisms in place to support workflow modification without adding workflow-related codes into your local copy of an ontology. It is critical that the purpose of the ontology be focused on coding clinical data. Even billing ontologies, like CPT, should never be extended with data that is clinically meaningless outside your organization. A good rule of thumb is that an additional, custom code is acceptable for clinical reasons but not for workflow reasons.

Many practice management systems, or EHR systems that grew from practice management systems, rely on additional custom codes in the CPT database to function. This is a dangerous design flaw, and software that does this should be avoided or repaired if possible.

Other Ontologies

The fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM IV) is the current way for mental health providers to code behavioral disease. The DSM V will be released in 2013. As we mentioned in the introduction, the DSM is often a controversial place where hot-button cultural issues play out. Mental health disease still has considerable stigma associated with it in the United States, and the DSM is often the battleground where arguments regarding perception take place.

Furthermore, mental health is one of the few endeavors where the name of your formal diagnosis might actually have impacts on your real-world health. Hearing that you have “mild” depression, rather than “major” depression, for instance, might actually impact how depressed you feel. All of this makes DSM one of the most important and high-impact healthcare ontologies available.

Mental health issues are often coded in other, general-purpose ontologies like ICD and SNOMED, making UMLS especially important for mental health coding purposes.

Cyc is a an artificial intelligence project that is attempting to build a very comprehensive ontology for everything, and not just for healthcare. OpenCyc is an open source release of Cyc that has some reduced capacity. Although Cyc is not comparable to a medical ontology in depth it does posses considerable healthcare terminology and is valuable because it is not limited to healthcare. For the most part, Cyc is not typically used directly for clinical purposes, but can be used in clinical research projects.

LOINC is an ontology of lab result codings that is discussed in Chapter 11.

OpenGALEN and OpenEHR are both attempts to promote open source ontology concepts. Both of the projects have been maturing but some view these as unnecessary additions or alternatives to SNOMED+UMLS. However, they are available under open source licensing terms might make them a better alternative to SNOMED for certain jurisdictions.

One of the largest and fastest growing areas of medical ontologies is genomic ontologies. Currently genomic research is conducted outside the normal delivery of care. As typical healthcare providers begin to improve the quality of clinical data that they encode, clinical ontologies like SNOMED CT will be leveraged to bridge normal clinical processes into genomic research areas. As this happens the separation between clinical ontologies and genomic ontologies will begin to fade. For the time being, genomic ontologies are largely separate from clinical ontologies.

There are hundreds of health care, medical, or biological ontologies that we are not covering. Most of them are very focused, like “anatomy of a mouse ontology.” A good place to start looking for a particular ontology might be the Open Biological and Biomedical Ontologies Foundry, which has a list of ontologies, or the BioPortal project from the National Center for Biomedical Ontology, which is actually a searchable database of several ontologies.

Sneaky Ontologies

There are several projects that do not sound like ontologies but in fact are. Medical abbreviations are the best example of this. Medical abbreviations are shorthand acronyms or phrases that have well-understood clinical meanings. They can be used when making paper or digital notes without concerns.

Medical abbreviations, especially related to medication instructions, have been well-documented as a significant and dangerous source of medical errors. Many ambiguous medical abbreviations should never be used. There are several lists of these dangerous abbreviations, and a comprehensive list can be found at the Institute for Safe Medication Practices (ISMP). Even after avoiding this list, there is a danger that a clinician will mistype one abbreviation and change it into another.

If these dangers can be avoided, medical abbreviations can save tremendous typing time in an EHR. Modern EHR systems often integrate an abbreviations database that will automatically change medical abbreviations in the longer English descriptions in real time. By allowing the clinician to instantly see what the medical abbreviation that he or she is using actually translates to in real time, medical abbreviations can be made even safer. Moreover, they can dynamically enforce the avoidance of error-prone abbreviations.

There are several efforts to create ontologies that are specifically designed to map from clinical terminology to natural language. Plainlanguage.gov has a section on health literacy that links to several documents that convert clinical terms into natural language. These dictionaries are the simplest forms of ontologies, mere definitions of terms, with no codes at all. However, these natural language dictionaries can be used to develop PHR interfaces that will contribute substantially to health literacy. Rather than automatically substituting natural language terms for clinical terms, they can simply be added to patient-facing systems. For instance, when a clinical test says “cardiology,” that can be replaced with “cardiology (medical treatment of heart problems).”

Ontologies Using APIs

Some ontologies are available as APIs.

The National Provider Identifier database is a database containing most clinicians who can prescribe in the United States. The database is available for download from HHS, and can be searched directly at HHS or Doc NPI.

You can search for APIs at a general-purpose Programmable Web site. Several ontologies and health data sets are available in the medical and health subsections.

Health.data.gov provides a list of data sets that are available for download from the federal government. Although few of the data sets qualify as ontologies, almost all of them are valuable sources for clinical research purposes.

Exercising Ontologies

When actually working with ontologies, you need systems that enable you to store and work with them, integrating them into your clinical IT environment. There are several good open source options known as terminology servers, including Apelon DTS (Distributed Terminology System), which is part of the Open Health Tools (OHT) family of projects. OHT hosts several other important terminology related projects.

The Cancer Biomedical Informatics Grid (CaBIG) and Informatics for Integrating Biology and the Bedside (i2b2) are both attempts to leverage clinical and research data for clinical research. Originally, the CaBIG project was exclusively focused on cancer research, but now almost all kinds of clinical research are supported. CaBIG handles research data exchange (usually de-identified data) and have features that help support clinical trials and many other clinical and research processes. i2b2 has always been focused on merging clinical and genomic data sets and clinical ontologies to develop relevant clinical practices.

Protégé is an open source resource for working with ontologies in the OWL formats.

OpenEHR is a controversial approach to applying knowledge engineering principles to the entire EHR, including things like the user interfaces. You might think of OpenEHR as an ontology for EHR software design. Many health informaticists disagree on the usefulness of OpenEHR. Some believe that HL7 RIM, given its comprehensive nature, is the highest level to which formal clinical knowledge managing needs to go.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset