WikiBI/Concepts and Ontologies

From AI Wiki

Jump to: navigation, search

Biomedical Concepts and Ontologies

Contents

Abstract

In this chapter, we focus on biomedical concepts and ontologies. Firstly, we define the term ‘concept’, describe its characteristics and show the importance of context for a concept. Then, the different types of concept relationships are examined and how these affect the structure of concept. Finally, we illustrate the meaning of ontologies. Five examples on ontology systems are shown here, including a general ontology system which is WordNet and four biomedical ontology systems which are UMLS, SNOMED CT, GO and HUGO. We further investigate two case studies for using biomedical concept, concept relationship and ontology in biomedical research.

Introduction

The biomedical field of study has experienced major advances in the past 30 years. The knowledge and data in the field keep increasing dramatically. These data range from earlier observations recorded that describe the anatomy of the human body down to the molecular level that describe viruses. All this information collected must be stored and then be made available to other practitioners that will help them with their own experiments or diagnoses. The overwhelming quantity of data must be described and then categorized in a way that will be more accessible for both humans and computers. This gives rise to the need of biomedical concepts and relationships between the concepts, which will unify the data and make them easier for a machine to categorize. The ultimate goal of course is not the categorization of concepts, but to use this organized information to help with treatment, diagnosis and development of new techniques.

Ontology refers to what types of entities comprise the reality. For computer scientists, it means organizing concepts in domains. General ontologies are independent of a specific task or domain. They are very broad. Domain ontologies are about knowledge in a specific domain, such as medicine and sports. Biomedical is one of such domains. If ontologies contain internal consistency, acyclic polyhierarchies and computable semantics, they can be discriminated from general hierarchies of concepts. Biomedical ontologies are becoming synonymous with concept collections assembled by description logics (Baader and Nutt, 2002). Earlier works in the biomedical field such as Rector and the Galen project showed us how to represent medical concepts by formal ontologies. This project also demonstrated some applications where ontologies were important (Rector, Nowland, and the Galen Consortium, 1993).

Overview

Concept

A ‘concept’ according to the American National Standard Dictionary of Information Technology (ANSDIT) is “an abstraction of one or more characteristics of a set of instances or occurrences and typically represented by a set of rules for determining category membership”. So a concept can be used to describe a procedure, an organ, physical characteristics and so on, the possibilities are endless. In order to define a concept we need to know how this concept was generated and under what conditions. Furthermore we need to know how this is going to be stored, shared and at the last stage how is it going to be retrieved for processing. Figure 1 illustrates this process and the modules for this process are explained below.

Figure 1 - Concept stages
Figure 1 - Concept stages

Information Capture

Important in defining a concept is to understand how the information, that the concept is meant to describe, was captured. The information used to create a concept can be obtained in many different forms like notes on a patient’s medical file or audio recordings of observations. No matter what the source, a medical concept needs to be machinable i.e. a standard representation that a computer can process. To achieve this, the raw data obtained must go through some processing which will map them to structures understood by machines.

Communication

Another critical consideration for concept definition is the way this concept will be exchanged between different information sources. The exchange of information between practitioners is critical in medical applications. The patient medical records are an example of critical information exchange. Medical records however, can have a different information structure depending on who composed it. Therefore, some form of standard communication method must exist so that encoded health care information can be exchanged and does not lose its meaning.

Information Retrieval

An important characteristic any piece of information must have is how easy it is to retrieve it. Some indexing services do exist like ICD-9 which provides coded numbers for diseases, drugs, procedures, etc. However this type of indexing does not allow for much flexibility when retrieving information.

A more recent indexing service MeSH[1] (Medical Subject Heading) which is used by Medline uses 22,568 descriptors (as of 2005) with synonyms to retrieve medical journal articles. For effective information retrieval a good organized structure must exist which is based on the proper concept definition. A weak concept definition will lead to poor knowledge organization.

Decision Support

With all this information generated, tools that will help medical practitioners to make a decision about a case are needed. All the concept definitions created would be useless if they were not interpretable by machines so they can assist in diagnosis. For this purpose the Arden syntax was created in 1992. Medical Logic Modules (MLM) can be written using the Arden Syntax and they contain logic that can make a medical decision, provide alerts, perform screening for medical research and more (Jenders[2]).

Context

Context is very important when it comes to interpreting concepts. The same concept when viewed under different context can mean an entirely different thing and can therefore lead to false diagnosis for example. The context is more difficult for a machine to understand that’s why standards like UMLS allow for a concept to be defined within a context.

Concept Characteristics

Important to understand the context of a concept is to understand the characteristics that define a concept.

  • Certainty
A disease diagnosis is an iterating process in most cases where a physician comes up with an initial diagnosis and through observations will eventually get to the true diagnosis. So an uncertainty exists through the whole diagnosis process and even the final diagnosis might not be the proper one. That’s why all the concepts generated by physicians’ observations have a degree of certainty associated with them.
  • Etiologic Precision
The exact cause of a concept is hard to define and thus hard to infer the correct diagnosis.
  • Granularity (specificity)
How a diagnosis is expressed can vary significantly. For example one can say that the diagnosis is ‘cancer’ however, a more detailed diagnosis with higher specificity can also be expressed like “stage III ovarian cancer limited to the left ovary”.
  • Completeness
This characteristic of a concept is very similar to the granularity and the distinction between the two is hard to make. According to Chen[3] the distinction between the two is determined by the way information is presented. So when using vocabulary expressions Granularity is the proper term but when using information model structures completeness is more appropriate.

Domains

Domains are really important in defining the context in which a concept should be interpreted. Abbreviations pose such a problem where the same abbreviation can be interpreted differently depending on which domain we are looking at. An example shown by Chen[3]that illustrates this perfectly is the initials ‘MS’. Depending on which domain we are refering to, 'MS' can be interpreted differently as shown below.

  • Engineering: Millisecond
  • Computer Science: Microsoft
  • Cardiology: Mitral Stenosis
  • Neurology: Multiple Sclerosis

The need for a unique identifier for each concept and its context is apparent. To resolve this issue the Unified Medical Language System (UMLS) assigns such a concept identifier that will also point out in which domain this concept applies to.

Structure

Figure 2 - Concept structure models
Figure 2 - Concept structure models

Context for a concept can be represented by using the correct structure model. There are two basic structure models than can be used to describe a concept. These are the terminology/vocabulary model and the information model (Chen et Al 2005[3]). The former relies on descriptive terms to define a concept and it is used by SNOMED CT among others. The information model relies on structures which contain classes, attributes and associations. Such a modeling scheme is used by HL7 Reference Information Model (RIM). Figure 2 shows how the two models look like when trying to describe that a patient has a family history of breast cancer. The Vocabulary model is using sentences that are easy for a human to interpret while the information model is using 'containers' to achieve the same result.

With two models in existence many difficulties arise because they are not compatible with each other and therefore many problems are encountered in the case of information exchange. Adopting one of the two models as a standard and ignoring the other is not a good solution since both offer benefits depending on the application.

Concept relationships

A concept on its own is not beneficial for any field because it will not provide any useful information for coming to a conclusion. For example the concept of a symptom will only tell us what that symptom is. No information on how to treat the symptom or what disease that symptom indicates is given. For this reason concept relationships are important within a domain and even among domains. The relationship between concepts is what glues concepts together and eventually make up the domain. Concept relationships come in different forms.

The general categories of these are:

  • Synonyms
  • Hierarchical Relationships
  • Associative relationships

Synonyms are the simplest form of a relationship that can exist. A synonym relationship implies that two concepts refer to the same object. Consider the terms ‘kidney’ and ‘renal’ which both refer to the same concept. For searching purposes they can be consider interchangeable.

Hierarchical relationships imply some form of structure and usually exist between concepts in the same domain. This kind of relationship is used when one concept defines a narrow spectrum of a broader concept. An example would be the relationship that exists between a ‘human head’ and an ‘eye’ where an ‘eye’ is part of a “human head”. Hierarchical relationships can be broken down even further into Generic Specific and Partitive hierarchies. In Generic specific relationships the narrower concept will inherit the characteristics of the broader concept. In the case of a partitive hierarchy however, concept B is a part of concept A by they do not share the same characteristics like in the example given above.

Finally associative relationships can exist between concepts that are not in any of the above relationships and can also be in different domains. For example the relationship ‘painkillers’ can treat ‘headache’ is an associative one.

Figure 3 - Concept Relationships in UMLS
Figure 3 - Concept Relationships in UMLS

Figure 3 was taken from the UMLS documentation and it illustrates a part of the concept relationships that exist in UMLS. The black lines indicate hierarchical type of relationships, e.g Tissue is a Fully Formed Anatomical Structure. The red arrows indicate an associative type of relationship, e.g. a cell is a part of a Tissue.

Document relationships to extract information

There is a great amount of information in the biomedical filed that comes in the form of articles. These articles contain information about various biomedical concepts and how they are related to one another. For example an article might talk about Gene 'ABC' that is responsible for breast cancer. We can interpret this as an associative relationship between Gene 'ABC' and breast cancer i.e. ABC 'is_responsible_for' breast cancer. However we would like to combine such relationships that exist in different articles and discover new relationships among those concepts. For such a purpose the Association for Computing Machinery (ACM) in 1997 came up with a list of document relationships that are useful for information retrieval (Chen et al, 2005[4]). These are explained below:

  • Topic Matching: By examining the topic of 2 two documents to see if they have are talking about the same thing
  • Word-base Relationships: The same words appear on the documents
  • Attribute based: For example two documents have the same author.
  • Document-Document hierarchical: For example Document A is the appendix in document B.
  • Document-Document topological: When two documents are conceptually equivalent. For example Document A is a translation of document B.
  • Document-Document influence: When one document affects the other. For example Document A is based on Document B.
  • Usage-based: Related through the use of the documents. For example documents were accessed by the same user.

Using these document relationships concept relationships can be extracted. For example one document might say that HIV destroys the immune system and another document says that HIV patients are more prone to sickness. Comparing the two documents it can be concluded that HIV patients are more prone to sickness because they have a weaker immune system.

Biomedical Ontologies

Biomedical ontologies are concerned with definition of biological classes and the relations among them. The purpose is to study entities that are crucial for the biomedical area. Biomedical terminology is different from biomedical ontology in that the former only collects names of entities in biomedical domain. Lower levels of upper level ontologies as well as general categories should be compatible with the equivalent semantic areas in the corresponding domain ontologies. For example, Disease in a general ontology should be compatible with that concept in a biomedical ontology. In addition, generic theories and meta-level categories should be shared by every type in every ontology [5].

Biomedical Ontologies provide machine-readable descriptions of biomedical concepts and their relations. The volume of biomedical literature is increasing very fast. So we need to use text-mining (TM) method to efficiently locate, retrieve and manage relevant information. In order to share the vast amounts of biomedical knowledge effectively, textual evidence needs to be linked to ontologies as the main repositories of formally represented knowledge. Ontologies are conceptual models that aim to support consistent and unambiguous knowledge sharing and that provide a framework for knowledge integration [6].

The principal link between text and an ontology is a terminology, which aims to map concepts to terms (Figure 4). Linking domain-specific terms to their descriptions in the ontologies provides a platform for semantic interpretation of textual information. Conceptual relations reflect the connections between the concepts denoted by the given terms. Conceptual relations are encoded in ontologies. Term ambiguity occurs when the same term is used to refer to multiple concepts[7].

 Figure 4 Mapping text into ontologies
Enlarge
Figure 4 Mapping text into ontologies

Examples of Medical Ontologies

In this section, examples of existing well-known ontology systems will be illustrated. WordNet is a general ontology system. UMLS, SNOMED CT, FMA are biomedical ontology systems. Different systems might have different representation on a particular ontology. So compatibility among different ontology systems is an important issue. For biomedical research, researcher can use different ontology systems to compare the fitness of an ontology system into a particular subdomain.

WordNet

WordNet ( http://wordnet.princeton.edu/ ) is an electronic lexical database developed at Princeton University [8]. Nouns, verbs, adjectives and adverbs are grouped into synsets. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet (2.1) contains 117097 nouns, 11488 verbs, 22141 adjectives and 4601 adverbs. For example, the adjective “renal” and the noun “kidney,” although similar in meaning, belong to two distinct structures, and a specific relationship, “pertainymy,” relates the two forms. Each synset in the noun hierarchy belongs to at least one is-a tree (hyponymy) and may additionally belong to several part-of like trees (meronymy) [5].

The Unified Medical Language System (UMLS)

The Unified Medical Language System (UMLS) ( http://www.nlm.nih.gov/research/umls/about_umls.html ) was developed by National Library of Medicine. The purpose is to help computer systems to understand the meaning of the language of biomedicine and health. The Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon are three UMLS Knowledge Sources.

The Metathesaurus is a multi-purpose and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts. All concepts in the Metathesaurus are assigned to at least one semantic type from the Semantic Network. The Metathesaurus has a very large vocabulary sources such as HL7, ICD, SNOMED, GO, HUGO. You can check it long list here. http://www.nlm.nih.gov/research/umls/metaa1.html

The Semantic Network can provide a consistent categorization of all concepts represented in the UMLS Metathesaurus and provide a set of useful relationships between these concepts. The Semantic Network consists of (1) a set of broad subject categories, or Semantic Types; (2) a set of useful and important relationships, or Semantic Relations. The current release of the Semantic Network contains 135 semantic types (http://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html) and 54 relationships (http://www.nlm.nih.gov/research/umls/META3_current_relations.html). Figure 5 is an example of "Biologic Function" Hierarchy. It is a portion of the UMLS Semantic Network. Each child in the hierarchy is linked to its parent by the "is a" link. Figure 3 is also an example of UMLS Semantic Network.

Figure 5 "Biologic Function" Hierarchy
Enlarge
Figure 5 "Biologic Function" Hierarchy

The SPECIALIST Lexicon is used to provide the lexical information needed for the SPECIALIST Natural Language Processing System (NLP). It includes commonly occurring English words and biomedical vocabulary.

The Systematized Nomenclature of Medicine (SNOMED)

SNOMED Clinical Terms ( http://www.snomed.org/snomedct/index.html ) is developed by the College of American Pathologies. It enables a consistent way of capturing, sharing and aggregating health data across specialties and sites of care. The version here (January 2007) contains more than 308,000 active concepts with formal logic-based definitions, more than 777,000 active English language descriptions for flexibility in expressing clinical concepts, and more than 924,000 defining relationships enable consistency of data retrieval and analysis. There are eighteen top-level concepts (Figure 6) and two types of relationships in SNOMED CT (Figure 7). Is-A relationships connect concept in a hierarchy, while attribute relationships connect concepts in different hierarchies[9].

Figure 6 Eighteen top-level concepts in SNOMED CT
Enlarge
Figure 6 Eighteen top-level concepts in SNOMED CT
Figure 7 Two types of relationships in SNOMED CT
Enlarge
Figure 7 Two types of relationships in SNOMED CT

Gene Ontology (GO)

The Gene Ontology (GO) project (http://www.geneontology.org/) is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases, FlyBase(Drosophila), the Saccharomyces Genome Database(SGD) and the Mouse Genome Database(MGD). The aim of the Gene Ontology (GO) project is to provide a standard language for the description of gene products. To address this aim, the GO project is developing ontologies and using them in annotation of gene products.

Figure 8 The system diagram of GO
Enlarge
Figure 8 The system diagram of GO

Cellular component, biological process and molecular function are the three organizing principles of GO. A gene product might be associated with or located in one or more cellular components. The terms in an ontology are linked by two relationships: (1) is_a (2) part_of. Figure 8 is a diagram of the whole system [10]. The solid lines represent is_a relationship, while the dashed lines mean part_of relationship in the diagram.

Case studies

Gene annotation

Annotating genes with controlled vocabulary codes (such as GO) is a labor-intensive task. The emergence of powerful methods for analyzing text raises the possibility that gene annotation can be facilitated using natural language processing (NLP) techniques. Raychaudhuri et al. explored the possibility that statistical natural language processing techniques could be used to assign GO codes [11]. Three document classification methods (maximum entropy modeling, naïve Bayes classification, and nearest-neighbor classification) were compared to the problem of associating a set of GO codes to literature abstracts and thus to the genes associated with the abstracts.

They built a document classifier based on the maximum entropy principle to associate abstracts with GO codes. Then they annotated each gene by combining the GO code classifications from all of their abstracts using a weighted voting scheme. The text in the unclassified abstract is compared to training examples that have been previously identified as relevant to specific GO codes. The unclassified text is assigned to categories based on similarities with the training examples. The advantage of maximum entropy classification is that in addition to assigning a classification, it provides a probability of each assignment being correct. These probabilities can be helpful in combining multiple gene annotation predictions. If they are reliable measures of prediction confidence, they can be leveraged in a voting scheme; the influence of low confidence predictions can be mitigated when combined with other high confidence predictions.

They conducted experiments to annotate Saccharomyces cerevisiae genes with codes from a subset of GO. They chose this organism because many of its genes have manually curated GO annotations that can be used as a gold standard. Two gene-document sets were used for testing. One set was a high quality list of PubMed citations hand-annotated by the curators of the Saccharomyces Genome Database. The other set consisted of literature associated with sequence homologs of yeast genes.

The training and test corpora of documents were constructed by searching PubMed for each code’s corresponding MeSH heading and title words. The 21 Gene Ontologies in this study are all within the biological process. They focused on children of cell communication and cell growth and maintenance. Documents published before 2000 constituted the training set, documents published in 2000 constituted one test set, and documents published in 2001 constituted another test set.

Table 1 lists the category name in the first column, the corresponding gene ontology code in the second column, and the PubMed query used to obtain abstracts in the final column. For the training dataset, the articles were obtained by using the query as listed in the table. Within a PubMed query the [MAJR] label specifies MeSH major headings, [MH] specified MeSH headings, [TI] specifies title words, and [DP] species publication data ranges. The test2000 and test2001 datasets were obtained by modification of the publication date limit to restrict articles to those published in 2000 and 2001, respectively. Titles were omitted from the test data sets. The table lists the number of articles obtained for each category for the training and test sets.

Table 1 Detailed information for 21 Gene Ontologies and testbed[11]
Enlarge
Table 1 Detailed information for 21 Gene Ontologies and testbed[11]

Using the maximum entropy classifier, the authors assigned GO codes to each gene, based on the gene’s curated abstracts from Saccharomyces Genome Database. Predictions were made only on those genes with three or more associated abstracts. To validate the predictions, they used the annotations assigned by the GO Consortium. If an annotation was more specific than 1 in their set of 21, they mapped it back to a relevant ancestor based on the GO hierarchy. A total of 991 genes were annotated with GO codes relevant to this study by the consortium. In total, 835 genes were annotated and also had the requisite number of abstracts. They calculated the precision and recall at various thresholds for each of the annotations using the GO Consortium assignments as a gold standard.

Maximum entropy modeling outperforms the other methods and achieves an accuracy of 72% when ascertaining the function discussed within an abstract. The maximum entropy method provides confidence measures that correlate well with performance. They conclude that statistical methods may be used to assign GO codes and may be useful for the difficult task of reassignment as terminology standards evolve over time. Such a method should reduce the time and labor necessary for gene annotation.

Genesence

Genescene project is another good example on using biomedical ontologies and text mining techniques to do knowledge discovery on biomedical literatures. It develops text mining techniques to support automated extraction and inference of regulatory pathways from biomedical literature.

The problem Genescene is trying to solve is finding an automated process that can analyze medical abstracts, extract the concepts and their relationships from them and create a map from all those information so a researcher is presented with all the information available for a particular gene. In order to do this it uses many of the concepts and methods discussed in this chapter.

Figure 9 illustrates the Genescene process. Genescene will first create a database with relationships extracted from selected abstracts in Medline. After the abstracts are downloaded, they are processed using the AZ Noun Phraser to extract noun phrases[12] and UMLS SPECIALIST Lexicon is used as a lexical lookup for AZ Noun Phraser. Phrases were analyzed and sorted to represent each phrase as a concept to be processed so that concept relationships will be extracted. [[1]]

Figure 9 Overview of Genescene architecture
Figure 9 Overview of Genescene architecture

In order to establish these relationships, Genescene uses a relation parser that looks at the nouns and verbs and checks what is the distance (how many words) between them and using a set of rules can determine if a valid relationship can be inferred from them. The relationships are then constructed by combining nouns as follows :

Left hand noun phrase (LP) - relationship connector - Right hand noun phrase (RP).

The relationship connector is extracted by looking at prepositions, verbs and adverbs. Since there are a lot of prepositions and most of them might not carry any useful information about the relationship between the nouns (such as ‘to’ which in most cases the authors found it to be used as an infinite marker), Leroy & Chen chose only three prepositions. These were ‘by’, ‘of’ and ‘in’ which occurred the most times in the abstracts analyzed. To demonstrate how the relationship extraction works, assume the following sentence “FK228 treatment inhibited growth”. The resulting relationship would be

LP: FK228 treatment

Connector: inhibit

RP: growth

Another added feature of the relationship parser is that it can understand the “and/or” statements and infer multiple relationships from them. For example consider the following sentence “Immunohistochemical stains included Ber-EP4, PCNA, Ki-67, Bcl-2, p53, SM-Actin, CD31, factor XIIIa, KP-1, and CD34.” The phrase parser will extract ten relationships (one for each element) which would be structured like

LP: Immunohistochemical stains

Connector: include

RP: Ber-EP4

Nine more relationships like that would be created but each one would have a different RP corresponding to each element in the sentence separated by comma.

Once the relationships are extracted from the abstracts phrase analysis is carried out to determine how many times the same phrase occurs in a document and in a collection of documents. Similar phrases are combined so they are represented only once. The phrase analysis provides a weighting factor for each phrase/relationship based on the frequency of each phrase [13].

Subsequently, the phrases must be processed so that they will be tagged by cross referencing them with three external sources which are HUGO, GO and the UMLS. These sources provide Genescene’s top-down knowledge by tagging terms in Genescene. To tag terms with HUGO identification numbers, they performed an exact match comparison of each Genescene term with approved symbols, approved gene names, previous symbols, and previous gene names in the HUGO database. If a match was found, the HUGO identification number would be stored. To tag terms with GO annotations, they performed an exact match comparison with the terms, term synonyms, and gene products in the GO database. To tag terms with UMLS codes, they compared terms with the UMLS Metathesaurus. They used rules to match the entire or partial, head phrase to the UMLS Metathesaurus. Terms found in this thesaurus would have a concept identification number and each concept had at least one semantic type in the UMLS Semantic Network. The semantic types were used to tag the Genescene terms. They did not try to limit the number of tags per term from any of the ontologies [13].

Figure 10 Relationship representation
Enlarge
Figure 10 Relationship representation

The final step in genescene is to present the user with the concept map that was generated and allow him/her to interact. Figure 10 shows the graphical representation of the concepts and the relationships generated by genescene. The blue boxes represent the concepts and the lines connecting them are the relationships. The relationships have a number next to them which indicates how many times that relationship occurred and arrows indicating the direction of the relationship. [14] The user is able to select whether to display mutations gene mutations or not. If the mutations are not considered, Genescene will treat all mutations of TP53 as the same concept and aggregate the relationships before displaying them. If the mutations are selected then the P53 breaks up as different concepts and the relationships for each are displayed. This feature is just one of many features present in the Genescene interface. Figure 11 shows the user interface and explains some of the options and features available to the user.

Figure 11 Genescene User Interface
Enlarge
Figure 11 Genescene User Interface

The user can navigate this map and see how the gene affects different processes.

Conclusion

Understanding how a concept was defined is very important in biomedical informatics. By properly defining a concept, makes it easier for a machine to process. Equally important are the relationships established between concepts as these will help to extract more accurate information and even find new links that were not obvious. Generally ontology systems are very broad. They might miss some ontologies in a specific domain. For a particular application in the biomedical area, researchers will benefit a lot from a large, well-defined domain ontology system.

References

  1. Medical Subject heading (MeSH) home, http://www.nlm.nih.gov/mesh/meshhome.html
  2. Jenders, Robert A. "The Arden Syntax for Medical Logic Systems", http://cslxinfmtcs.csmc.edu/hl7/arden/
  3. 3.0 3.1 3.2 H. Chen, S. F. Fuller, C. Friedman, and W. Hersh, “Medical informatics: Knowledge management and Data Mining in Biomedicine,” Chapter 6. Springer, 2005
  4. H. Chen, S. F. Fuller, C. Friedman, and W. Hersh, “Medical informatics: Knowledge management and Data Mining in Biomedicine,” Chapter 7. Springer, 2005
  5. 5.0 5.1 H. Chen, S. F. Fuller, C. Friedman, and W. Hersh, “Medical informatics: Knowledge management and Data Mining in Biomedicine,” Chapter 8. Springer, 2005.
  6. Stevens, R., Bechhofer, S. and Goble, C. (2000), ‘Ontology-based knowledge representation for bioinformatics’, Brief. Bioinformatics, Vol. 1(4), pp. 398–414.
  7. Irena Spasic, et al., Text mining and ontologies in biomedicine: Making sense of raw text, BRIEFINGS IN BIOINFORMATICS. VOL 6. NO 3. 239–251. SEPTEMBER 2005.
  8. Fellbaum, C., ed. (1999). WordNet: An Electronic Lexical Database, MIT Press, Cambridge, Massachusets.
  9. http://www.snomed.org/snomedct/documents/SNOMED_CT_Components_000.pdf
  10. Jennifer I. Clark, Cath Brooksbank, and Jane Lomax, It's All GO for Plant Scientists, Plant Physiol. 2005 July; 138(3): 1268–1279.
  11. 11.0 11.1 Soumya Raychaudhuri, Jeffrey T. Chang, Patrick D. Sutphin, and Russ B. Altman, Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature, Genome Research Vol. 12, Issue 1, 203-214, 2002
  12. K. Tolle and H. Chen, Comparing Noun Phrasing Techniques for Use with Medical Digital Library Tools, Journal of the American Society for Information Science, Special Issue on Digital Libraries, Volume 51, Number 4, Pages 352-370, 2000.
  13. 13.0 13.1 Gondy Leroy, H. Chen, "Genescene: An Ontology-Enhanced Integration of Linguistic and Co-Occurrence Based Relations in Biomedical Texts," Journal of The American Society for Information Science and Technology (JASIST) Vol. 56 No.5, 457-468, (2005).
  14. "Visualizing Aggregated Biological Pathway Relations" Byron Marshall, Karin Quiñones, Hua Su, Shauna Eggers, and Hsinchun Chen, Proceedings of the 2005 Joint ACM/IEEE Conference on Digital Libraries (JCDL 2005), June 7-11, 2005 , Denver, CO
Personal tools