SNOMEDtxt: NLG for Disease Descriptions

The article discusses SNOMEDtxt, a Natural Language Generation system that converts complex clinical terminology from SNOMED CT into more accessible narrative descriptions of diseases. User evaluations show that laypeople find these generated descriptions more informative and easier to understand than the original SNOMED content. This approach aims to improve access to clinical knowledge for both patients and medical professionals, facilitating ontology auditing and enhancing the review process of clinical information.

Uploaded by

gogigorgonzola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

SNOMEDtxt: NLG for Disease Descriptions

Uploaded by

gogigorgonzola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MEDINFO 2019: Health and Wellbeing e-Networks for All 1263

L. Ohno-Machado and B. Séroussi (Eds.)

© 2019 International Medical Informatics Association (IMIA) and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/SHTI190429

SNOMEDtxt: Natural Language Generation from SNOMED Ontology

Olga Lyudovyka, Chunhua Wenga

a
Department of Biomedical Informatics, Columbia University, New York, NY, USA

Abstract
78561
SNOMED Clinical Terms (SNOMED CT) defines over 70,000
diseases, including many rare ones. Meanwhile, descriptions of
rare conditions are missing from online educational resources.
SNOMEDtxt converts ontological concept definitions and 952 1016 1500 1570 2215 2938 7600
relations contained in SNOMED CT into narrative disease
descriptions using Natural Language Generation techniques.
Generated text is evaluated using both computational methods
and clinician and lay user feedback. User evaluations indicate
that lay people prefer generated text to the original SNOMED
content, find it more informative, and understand it significantly
Figure 1– Counts of Diseases in [Link], [Link],
better. This method promises to improve access to clinical
[Link], [Link], [Link],
knowledge for patients and the medical community and to assist
[Link], [Link], SNOMED [2] (Nov.
in ontology auditing through natural language descriptions.
10, 2018)
Keywords:
Systematized Nomenclature of Medicine, Natural Language An additional use case for SNOMEDtxt is to enable clinicians
Processing, Access to Information and domain experts without specialised technical training or
experience working with structured terminologies to review and
Introduction critique clinical knowledge defined in SNOMED CT. This task
is critically important as biomedical knowledge is growing
SNOMED CT is the world’s most comprehensive clinical exponentially with numerous data types and tools emerging
terminology [1]. The March 2018 release of the US version rapidly on the daily basis. For example, Campbell et al. reported
contains 347,231 unique concepts, including 78,561 diseases, that the absence of a robust granular ontology represents a
and defines 1,088,068 unique active relationships between these barrier to capturing and analyzing data in the field of cancer
concepts [2]. In contrast, the largest professional medical research and precision medicine [6], while Fung et al. made a
reference source, Medscape ([Link]), contains 7,600 similar observation in the area of rare diseases [7]. However,
diseases, representing less than 10% of the diseases defined in ontology auditing or quality ascertainment is largely performed
SNOMED CT, and the largest consumer health resource, Mayo by knowledge engineers with specialized training in ontology
Clinic ([Link]), describes 2,215 diseases. Disease design and maintenance. The workforce of this profession is
descriptions in these resources are manually curated, thus rare, hence creating a bottleneck for enabling scalable ontology
limiting the number of diseases which can be covered. Topics expansion or for crowdsourcing ontology auditing.
may be chosen according to popularity in search results [3], thus SNOMEDtxt allows representation of concepts and related
rare diseases are often excluded from these resources. Counts of information in natural text, thus expanding the group of
disease concepts in major medical information sources are potential reviewers to include any medical professionals who
shown in Figure 1. Google Knowledge Graph for diseases is not are not necessarily familiar with structured ontologies. Wider
available, but since it is curated from the sources listed in Figure review of SNOMED CT by clinicians can be expected to
1, it is likely on the same order of magnitude. improve accuracy, reduce missing information, and enable
faster SNOMED CT evolution as the body of clinical
While extensive, SNOMED CT is not easily accessible to the knowledge expands.
public and is known to be difficult to use even for clinicians
without training in ontologies [4,5]. Like other structured Natural Language Generation (NLG) is a technology utilizing
ontologies, SNOMED CT is not designed to be used directly by advanced computational methods to generate natural language
lay people. The US version of SNOMED CT contains only descriptions from structured knowledge or data representation.
4,372 text definitions easily interpretable by untrained Attempts to apply NLG to generate text from SNOMED CT
personnel, covering 2,608 diseases, corresponding to 1.3% of have been reported by Liang et al. [8] and Kanhov et al. [9].
all SNOMED CT concepts and 3.3% of disease concepts. Liang and colleagues developed OntoVerbal, a generic tool for
ontology verbalization that was then applied to SNOMED CT.
We propose a method called SNOMEDtxt to automatically While Kanhov and colleagues utilized an off-the-shelf natural
generate disease descriptions from SNOMED CT in order to language generator, they developed a methodology for user
make available to both patients and the medical community the evaluation of the fluidity and readability of NLG texts in the
valuable clinical knowledge contained in SNOMED CT. Biomedical domain. OntoVerbal was developed as a Protégé
4.2 plugin and is not available in more recent Protégé versions
1264 O. Lyudovyk and C. Weng / SNOMEDtxt: Natural Language Generation from SNOMED Ontology

or as a standalone application. The NLG system developed by Concepts are the key component of SNOMED CT. They are
Kanhov et al. was not made available for download or use. organized in a polyhierarchical structure with “Is-A” (parent-
child) relationship and can be additionally defined or described
OntoVerbal implements a generic verbalization approach for
through other relationships. Each relationship has a type, a
ontologies, with an emphasis on the ability to handle any OWL
source concept, and a destination concept. Once a disease
ontology and generate natural language descriptions for any
Concept ID is found, relevant relationships are retrieved from
entity type in that ontology [8]. This approach restricts handling
SNOMED CT database:
of relationships, or ontology axioms, to generic lexical choices
and results in some redundant and inelegant phrases, such as • Relationships where the searched disease Concept ID is
“chronic disease of the genitourinary system … has a finding the source
site in a structure of the genitourinary system.” In contrast, our
method trades off generalizability for improved readability and • Is-A relationships where the searched disease Concept
comprehensibility through more specific verbalizations of ID is the destination: the source concepts represent
SNOMED CT axioms and simplifying structures tailored to subtypes or examples of the disease and are included in
SNOMED CT concepts, so that the same construct is simplified the definition
by SNOMEDtxt as “… affects the genitourinary system.” Concept names are then retrieved for the corresponding target
Moreover, OntoVerbal takes the generic approach to ordering concepts. The generated text is the product of concept names
information from simpler sentences to more complex ones, arranged in lexical patterns corresponding to types of
whereas SNOMEDtxt follows the common flow of information relationship between these concepts. Concept names undergo
found in disease descriptions in reference medical texts: minimal string cleaning to remove non-informative structures
definition is followed by possible causes, presentation, such as “(Disorder)” and “(Body Structure)”.
diagnosis, clinical course, and finally additional information.
Structure And Aggregation
SNOMEDtxt is a novel NLG engine and interface, intended to
evolve and improve over time with user feedback. The current In order to produce fluid and coherent text and avoid
version focuses specifically on disease concepts and can be redundancy wherever possible, SNOMEDtxt aggregates and
easily extended to summarize procedures, treatments, and other structures information in three steps: Firstly, it groups all target
information contained within SNOMED CT and relevant to the nodes for the same relationship; secondly, it organizes
wider audience. relationships in broad logical groups; thirdly, it orders
relationships within each group and the groups themselves
Methods following a typical flow of information in a disease description
in medical reference texts. This stepwise grouping of
SNOMEDtxt follows a 4-step framework outlined in Figure 2 to relationships is a simplified application of the Rhetorical
generate a disease description for a given disease. Structure Theory [10] that describes a recursive approach to
organizing relationships in a text.

Table 1 – Organizing Relationships

Group Relationship Lexical Pattern
Definition IS-A “is a kind of”
Finding site “that affects the”
Has definitional “It manifests itself in”
manifestation
Associated “The associated
morphology morphology is”
Pathological process “Pathological process
associated with … is”
Children: IS-A, “An example of … is” /
searched “Examples of … are”
term=destination
Causality Causative agent “is caused by”
Due to “occurs due to”
Figure 2 – Framework for Disease Description Generation Associated with “is associated with”
Tempora- Occurrence “presents in” (period)
lity During/Following/ “can occur during /
Concept Search And Information Retrieval After following / after”
Temporally related “can be temporally related
The current implementation of SNOMEDtxt is based on the
to”
03/01/2018 release of SNOMED CT terminology, US edition,
Diagnosis Finding method “is discovered by”
Snapshot version, available for download from the SNOMED
Finding informer “<is discovered> through”
CT website [2]. SNOMEDtxt uses a local copy of this database. Clinical Clinical course “Clinical course is”
The system has the capability to randomly sample diseases from Course Severity “The severity of … is”
SNOMED CT and to search for disease names entered by a Episodicity “The episodicity of … is”
user. The search is undertaken in two steps: first, a simple Other Interprets “interprets or evaluates”
match on concept names and synonyms in SNOMED CT Has interpretation “… as”
database is attempted. If the search term is not found, the Other “Other related concepts
system then uses SNOMED CT Analyzer API (snomedct.t3as. include…”
org) to search for the term, provided that the API site is online.
O. Lyudovyk and C. Weng / SNOMEDtxt: Natural Language Generation from SNOMED Ontology 1265

Text Realization Relationships: Disorder of connective tissue (disorder) = Is a

(attribute). Connective tissue structure (body structure) =
The first task SNOMEDtxt undertakes in the Text Realization Finding site (attribute). Autoimmune disease (disorder) = Is a
phase is constructing an informative disease name. If the search (attribute). Autoimmune (qualifier value) = Pathological
term is significantly different from the preferred term for the process (attribute). Related concepts: Systemic lupus
disease concept, as measured by Jaro-Winkler string distance erythematosus (disorder) - Is a (attribute). Drug-induced lupus
[11], the disease description will combine both in the form of erythematosus (disorder) - Is a (attribute). Neonatal lupus
“<Preferred disease concept name> (also known as <searched erythematosus (disorder) - Is a (attribute). Discoid lupus
term>)”, e.g. “Influenza (also known as flu)”. erythematosus (disorder) - Due to (attribute).
Additionally, SNOMEDtxt concatenates all target nodes for the We evaluated disease descriptions generated by SNOMEDtxt
same relationship type which were aggregated in the previous against the concatenated SNOMED CT content using computed
step by following the “A and B”, “A, B, and C” format. When metrics and user evaluations. Both sets of evaluations indicate
concatenating examples of a given disease, SNOMEDtxt selects that SNOMEDtxt succeeds in making SNOMED CT content
a maximum of three examples, based on the largest string more readable and comprehensible.
dissimilarity with the given disease name, as a tradeoff between
completeness and relevancy. Computed Metrics
Finally, relationship types are converted into corresponding We computed readability and redundancy metrics for disease
lexical patterns (see Table 1) and sentences are generated. For definitions of the top 20 most searched diseases in 2017 [13]
the sake of conciseness, relationships in the same group are and of 20 diseases randomly retrieved from SNOMED CT:
combined into one sentence wherever this approach produces
fluid text. For example, the Is-A and the Finding site 1. Readability: Flesch-Kincaid grade level (FK) and
relationships are combined into one sentence that forms the Automated Readability Index (ARI) estimate the
concise definition of the disease: “Asthma is a kind of number of years of education needed to understand a
Respiratory disorder that affects the Airway”. Sentences are text. We calculated both with sylcount R package [12].
then ordered according to the order of relationships in Table 1. 2. Redundancy: calculated as the ratio of unique word
count to total word count after removing stop words.
Results Full summaries of health concepts retrieved from Medline Plus
web service ([Link]) were used as reference for the
User Interface of SNOMEDtxt first set of disease concepts. Since only 4 out of 20 randomly
sampled disease concepts had a reference health topic in
A simple user interface is implemented in RShiny and is Medline Plus, comparison with reference is not provided for the
available online at [Link] second set.

Table 2 – Evaluation with Computed Metrics

Readability Redundancy
FK ARI Words Unique/All
Top 20 most searched diseases
SNOMEDtxt 14.3 12.0 49.3 0.74
SNOMED CT 17.9 15.0 64.1 0.55
Reference 6.6 6.1 263 0.77
Random 20 SNOMED CT disease concepts
SNOMEDtxt 11.7 9.7 47.3 0.69
SNOMED CT 15.7 13.8 69.7 0.56

For both measures of readability, a lower score indicates a

Figure 3 – Screenshot of SNOMEDtxt Interface lower grade of education needed to understand the text and
therefore better readability. These metrics indicate that
An example disease description generated by SNOMEDtxt and SNOMEDtxt texts are more readable than the original
the corresponding concatenated SNOMED CT content are SNOMED CT content. For the 20 most searched diseases, the
illustrated below. average FK score for SNOMEDtxt texts (14.3) is equivalent to
the second year of undergraduate degree, and FK for SNOMED
SNOMEDtxt Disease Description
CT content (17.9) corresponds to the graduate school level. ARI
Lupus erythematosus (also known as Lupus) is a kind of score of 12.0 for SNOMEDtxt is equivalent to twelfth grade,
Autoimmune disease and Connective tissue disease that affects while ARI of 15.0 for SNOMED CT content indicates that the
Connective tissue. Some examples of Lupus erythematosus are text is appropriate for readers at the Professor level. Readability
Systemic lupus erythematosus, Drug-induced lupus scores for the MedlinePlus reference texts are significantly
erythematosus, and Neonatal lupus erythematosus. lower, indicating that they can be read by a much wider
Pathological process associated with Lupus erythematosus is AI audience than either SNOMEDtxt or the original SNOMED CT
- autoimmune. Other related concepts are Cutaneous lupus content.
erythematosus, Lupus erythematosus profundus, and Discoid
SNOMEDtxt texts also improve on the redundancy metric
lupus erythematosus of eyelid.
compared to SNOMED CT content for the top 20 searched
SNOMED CT Content diseases (0.74 vs. 0.55) and for the 20 randomly sampled
diseases (0.69 vs. 0.56).
ConceptID: 200936003. Terms: Lupus erythematosus, LE -
Lupus erythematosus, Lupus, Lupus erythematosus (disorder).
1266 O. Lyudovyk and C. Weng / SNOMEDtxt: Natural Language Generation from SNOMED Ontology

User Evaluation rate (72%) and as complete (78%) as the original content, while
they found 28% of descriptions to be somewhat less accurate,
A survey evaluating results of SNOMEDtxt was conducted
6% somewhat less complete, and 17% significantly less com-
among 51 lay people recruited using Amazon Mechanical Turk
plete.
(MTurk) and 6 clinicians from Columbia University Medical
Center. MTurk is a crowdsourcing marketplace that enables
outsourcing tasks like surveys to a distributed workforce for a Table 3 – User Evaluation: SNOMEDtxt vs. SNOMED
small reward. Evaluations of all MTurk taskers that applied and
did not self-identify as clinicians were included in the results. Readability and Preference
Evaluations of all 6 clinicians who responded to the survey
SNOMEDtxt SNOMED CT No Difference
were included in the results. All evaluators were provided with
Lay Audience (n=51)
a basic description of the project, but were not aware of the
Easier to read 76.5% 14.4% 9.2%
study design or the research question.
Preferred 68.6% 21.6% 9.8%
We randomly selected a set of 20 disease concepts from Clinicians (n=6)
SNOMED CT for evaluating readability, preference, accuracy, Easier to read 83% 11% 6%
and completeness (set 1). Helpfulness was evaluated on a set of Preferred 44% 28% 28%
20 disease concepts for which a medical reference text was
available (set 2). Questions probing the degree of understanding Helpfulness and Understanding
were constructed for 10 diseases with sufficient information SNOMEDtxt SNOMED CT
selected from 40 randomly sampled disease concepts (set 3). Lay Audience (n=51)
Comparison with OntoVerbal was restricted to 4 diseases for Helpful (1-10) 5.7 4.8
which OntoVerbal description was available in [8] (set 4). For Correctly understood 72.1% 51%
all 4 sets, we generated a SNOMEDtxt disease description and a Clinicians (n=6)
concatenation of SNOMED CT content. The survey was con- Helpful (1-10) 3.50 3.58
ducted using Qualtrics survey platform ([Link]) Correctly understood 100% 100%
and included randomization: each evaluator was presented with
3 randomly selected diseases from set 1, 2 from set 2, 2 from set Accuracy and Completeness
3, and 1 from set 4.
SNOMEDtxt vs. Significantly Somewhat Same
In order to assess readability and general preference, we pre- SNOMED CT Worse Worse
sented evaluators with SNOMEDtxt disease descriptions and Clinicians (n=6)
the SNOMED CT content for 3 diseases from set 1 and asked Accuracy 0% 28% 72%
whether one or the other was more readable or generally pre- Completeness 17% 6% 78%
ferred, or there was no difference. Evaluators were not informed
which text represented SNOMEDtxt output. Lay people found A conclusive comparison between OntoVerbal and
76.5% of SNOMEDtxt disease descriptions easier to read than SNOMEDtxt was not feasible since only 4 disease descriptions
the SNOMED CT content, and preferred 69% of SNOMEDtxt were available for OntoVerbal. We conducted a limited com-
descriptions to SNOMED CT content. Clinicians found 83% of parison by presenting all evaluators with the SNOMED CT
SNOMEDtxt descriptions easier to read and preferred 44% of content and with a disease description from either OntoVerbal
them to the SNOMED CT content. or SNOMEDtxt for the same disease (evaluators were unaware
of the source of each text). All evaluators were asked which text
We tested understanding by presenting the evaluators with ei-
they found easier to read and generally preferred; clinicians
ther the SNOMEDtxt description or the SNOMED CT content
were additionally asked whether the text description was less
for a concept, followed by a multiple choice question designed
accurate / complete than (denoted in Table 4 as “Worse”) or as
to test whether the evaluator understood the text; we then com-
accurate / complete as (denoted as “Same”) the SNOMED CT
pared the number of correct answers given when presented with
content. This limited comparative evaluation points to a prefer-
SNOMEDtxt description or with the SNOMED content.
ence for SNOMEDtxt disease descriptions with the same or
SNOMEDtxt format appeared to be significantly easier to un-
better performance on readability, accuracy, and completeness.
derstand for lay users: they gave the correct answer 72% of the
time when presented with SNOMEDtxt description and only
Table 4 – User Evaluation: Comparison with OntoVerbal
51% when presented with SNOMED CT original content.
There was no difference for clinicians: they gave the correct SNOMEDt Onto SNOMED No Dif-
answer 100% of the time regardless of what text they were pre- xt Verbal CT ference
sented with. Lay Audience (n=51)
To evaluate helpfulness, we presented evaluators with the Easier to read 49% 43% 3.9% 3.9%
SNOMEDtxt description, the SNOMED CT content, and a de- Preferred 52% 31% 11.8% 3.9%
scription of the same concept from either Medline Plus or Clinicians (n=6)
Google Knowledge Graph as a reference and asked “How help- Easier to read 50% 50% 0% 0%
ful was the terminology content compared to” the reference, on Preferred 50% 17% 17% 17%
a scale from 1 to 10. Lay people found SNOMEDtxt descrip- SNOMEDtxt OntoVerbal
tions more helpful: the average helpfulness score for vs. SNOMED CT vs. SNOMED CT
SNOMEDtxt texts was 5.7, compared to 4.8 for SNOMED CT Clinicians (n=6) Worse Same Worse Same
content. On the other hand, clinicians found SNOMEDtxt de- Accuracy 0% 45% 27% 27%
scriptions on average minimally less helpful than SNOMED CT Completeness 18% 37% 18% 27%
content (3.50 versus 3.58).
User evaluation demonstrates potential utility of SNOMEDtxt
Clinician evaluators were also asked to assess accuracy and
for lay users: they find the generated disease descriptions more
completeness for disease concepts from set 1. In most cases
readable and easier to understand than the structured SNOMED
clinicians thought the SNOMEDtxt descriptions were as accu-
O. Lyudovyk and C. Weng / SNOMEDtxt: Natural Language Generation from SNOMED Ontology 1267

CT content. The accuracy and completeness of SNOMEDtxt’s More broadly, natural langauge processing is growing in
natural language descriptions is close to the original SNOMED importance with many potential applications in Healthcare
CT content. The use case of assisting in SNOMED CT content systems. NLG involves several important tradeoffs, which
review would require some adjustments to SNOMEDtxt design should be made with a specific application in mind. Two such
in order to produce more faithful representations of the tradeoffs are balancing completeness and accuracy on one hand
SNOMED CT content. with fluidity and comprehensibility on the other; and
generalizability versus linguistic polish and expressiveness.
Discussion
References
We introduce a method to generate disease descriptions directly
from the SNOMED CT ontology for two main applications: [1] J. Millar, The Need for a Global Language - SNOMED CT
providing access to definitions of rare diseases or disease Introduction, Stud Health Technol Inform 225 (2016) 683-5.
variants not described in clinical reference resources and [2] US National Library of Medicine, SNOMED CT United
enabling easier comprehension of SNOMED CT content for States Edition,
those reviewing, verifying, and extending the ontology. [Link]/healthit/snomedct/us_edition .html
In the design of SNOMEDtxt, we have made several choices (Accessed April 28, 2018).
that favor fluidity and ease of comprehension over faithful and [3] N. Miller, E. M. Lacroix, J. E. Backus, MEDLINEplus:
complete representation of information, at the risk of possible building and maintaining the National Library of Medicine's
loss of information. The human evaluation of results confirms consumer health Web service, BMLA 88(1) (2000), 11-7.
that we achieved the goal. However, these choices may not be [4] S.Y. Kim et al., Comparison of Knowledge Levels Required
appropriate when SNOMEDtxt output is used to verify content for SNOMED CT Coding of Diagnosis and Operation
of SNOMED CT. It may be desirable to provide users with Names in Clinical Records, Healthc Inform Res 18(3)
configurations such as “more precise” and “easier to (2012), 186-90.
understand” when generating the natural language texts. [5] J. E. Andrews, R. L. Richesson, J. Krischer, Variation of
Another tradeoff made in the design of SNOMEDtxt was SNOMED CT Coding of Clinical Research Concepts
readability at the expense of generalizability. In order to extend among Coding Experts. JAMIA 14(4) (2007), 497–506.
SNOMEDtxt to other types of concepts or to other [6] W.S. Campbell et al. A computable pathology report for
terminologies, verbalizations of relationships and handling of precision medicine: extending an observables ontology
aggregated sentence structures would need to be adjusted. unifying SNOMED CT and LOINC, JAMIA 25(3) (2018),
259–266
A significant limitation to the use of SNOMEDtxt for the wider [7] K.W. Fung, Coverage of rare disease names in standard
audience is the amount of content available for each disease terminologies and implications for patients, providers, and
concept in SNOMED CT. Expanding, i.e. explaining, some research, AMIA Annu Symp Proc (2014 Nov 14) 564-72.
related nodes, for example parent disease node or finding site, [8] S.F. Liang et al., OntoVerbal: A Generic Tool and Practical
may add meaningful and relevant information to the generated Application to SNOMED CT, IJASCA 4(6) (2013) 227-239.
disease descriptions. A navigable user interface where a user [9] M. Kanhov, X. Feng, H. Dalianis, H. Natural Language
could click on confusing terms and see them explained would Generation from SNOMED Specifications, CLEFeHealth
be an alternative approach to this challenge. Developing APIs to 2012 Workshop (2012).
access SNOMEDtxt would enable integration of textual disease [10] M. Taboada, W.C. Mann, Applications of Rhetorical
descriptions into other electronic resources and reference Structure Theory, Discourse Studies, 8(4) (2006), 567–588.
materials, such as EHR help function or patient portals. The [11] W.E. Winkler, String Comparator Metrics and Enhanced
search functionality in the current implementation is limited to Decision Rules in the Fellegi-Sunter Model of Record
exact string match with either the SNOMED term name or any Linkage, American Statistical Association, (1990), 354–359.
of the term’s synonyms and can be further improved with string [12] D. Schmidt, ‘sylcount’ R package (2017).
search algorithms. [13] K. Sheridan, The 20 most Googled diseases, StatNews
Results of the evaluation by lay people and clinicians presented (June 6, 2017).
in this paper are encouraging for the potential use of
SNOMEDtxt in making SNOMED CT content more accessible Address for correspondence
and easier to review; however, a more rigorous evaluation with
a larger audience and a greater number of tested concepts is Chunhua Weng, Department of Biomedical Informatics, Columbia
recommended. University, email: chunhua@[Link]
Finally, to allow the system to continuously learn and evolve,
evaluation and feedback elicitation can be bulit into the user
interface. Presenting users with different verbalization options
at random and gathering user feedback would enable the system
to learn verbalization patterns favored by users and evolve the
NLG engine accordingly.

Conclusion

This work presents an ontology verbalizer for SNOMED CT

disease concepts: a tool that generates natural language concept
descriptions balancing completeness and accuracy with the ease
of human comprehension. User evaluation shows that lay
people prefer to read natural text instead of structured
ontologies and understand textual descriptions better.

Translation and Localization of SNOMED CT in China: A Pilot Study
No ratings yet
Translation and Localization of SNOMED CT in China: A Pilot Study
4 pages
Understanding SNOMED CT Hierarchies
No ratings yet
Understanding SNOMED CT Hierarchies
31 pages
BMC Medical Informatics and Decision Making: Forty Years of SNOMED: A Literature Review
No ratings yet
BMC Medical Informatics and Decision Making: Forty Years of SNOMED: A Literature Review
6 pages
Overview of SNOMED CT for Clinicians
No ratings yet
Overview of SNOMED CT for Clinicians
3 pages
Snomed CT
No ratings yet
Snomed CT
56 pages
Knowledge-Infused Disease Diagnosis Model
No ratings yet
Knowledge-Infused Disease Diagnosis Model
17 pages
StarterGuide Current-en-US INT 20230421
No ratings yet
StarterGuide Current-en-US INT 20230421
57 pages
SNOMED CT Integration in LLMs Review
No ratings yet
SNOMED CT Integration in LLMs Review
20 pages
4 SNOMED CT Overview - Data Analytics With SNOMED CT - SNOMED Confluence PDF
No ratings yet
4 SNOMED CT Overview - Data Analytics With SNOMED CT - SNOMED Confluence PDF
1 page
IHTSDO SnoMed CT User Guide
No ratings yet
IHTSDO SnoMed CT User Guide
98 pages
SNOMED CT Machine Readable Concept Model
No ratings yet
SNOMED CT Machine Readable Concept Model
38 pages
Xxxmoved To International Editio - 230225 0350 68570
No ratings yet
Xxxmoved To International Editio - 230225 0350 68570
1 page
Health Informatics Interoperability Standards
No ratings yet
Health Informatics Interoperability Standards
1 page
Implementing Health 4.0 in Zimbabwe's Oncology
No ratings yet
Implementing Health 4.0 in Zimbabwe's Oncology
5 pages
Domain Ontology Development For Communicable Diseases
No ratings yet
Domain Ontology Development For Communicable Diseases
10 pages
Diagramming Guideline
No ratings yet
Diagramming Guideline
39 pages
Auditing SNOMED CT and NDF-RT Hierarchies
No ratings yet
Auditing SNOMED CT and NDF-RT Hierarchies
5 pages
Lightweight Graph Method for Clinical Insights
No ratings yet
Lightweight Graph Method for Clinical Insights
7 pages
SNOMED CT: No Abnormality Detected
No ratings yet
SNOMED CT: No Abnormality Detected
218 pages
SNOMED CT API Overview and Concepts
No ratings yet
SNOMED CT API Overview and Concepts
19 pages
Clinical Term Embeddings from SNOMED CT
No ratings yet
Clinical Term Embeddings from SNOMED CT
10 pages
URI Standard
No ratings yet
URI Standard
24 pages
Medical Image Captioning Techniques
No ratings yet
Medical Image Captioning Techniques
4 pages
Advances in Chinese Medicine Diagnosis: From Traditional Methods To Computational Models
No ratings yet
Advances in Chinese Medicine Diagnosis: From Traditional Methods To Computational Models
23 pages
Medical Prescription Data Extraction in BD
No ratings yet
Medical Prescription Data Extraction in BD
6 pages
Health Record IT Standards Overview
No ratings yet
Health Record IT Standards Overview
1 page
BmQGen Biomedical Query Generator For Knowledge Discovery
No ratings yet
BmQGen Biomedical Query Generator For Knowledge Discovery
6 pages
NLP for SNOMED-CT Code Extraction
No ratings yet
NLP for SNOMED-CT Code Extraction
7 pages
Technology Watch
No ratings yet
Technology Watch
241 pages
US-China Collaboration in Health AI
No ratings yet
US-China Collaboration in Health AI
13 pages
Overview of SNOMED CT Terminology
No ratings yet
Overview of SNOMED CT Terminology
45 pages
RareDis Corpus: Annotated Rare Diseases
No ratings yet
RareDis Corpus: Annotated Rare Diseases
34 pages
SNOMED CT Editorial Guide Overview
No ratings yet
SNOMED CT Editorial Guide Overview
285 pages
Chest X-Ray Diagnosis via Image Captioning
No ratings yet
Chest X-Ray Diagnosis via Image Captioning
20 pages
SNOMED CT to ICD-10 Mapping Guide
No ratings yet
SNOMED CT to ICD-10 Mapping Guide
36 pages
Chinese Medicine Diagnosis Evolution
No ratings yet
Chinese Medicine Diagnosis Evolution
44 pages
Hull Self Assessment in Medicine Guide
No ratings yet
Hull Self Assessment in Medicine Guide
14 pages
Evidence-Based Treatment Model for Ailments
No ratings yet
Evidence-Based Treatment Model for Ailments
15 pages
Error Detection in SNOMED CT Vocabulary
No ratings yet
Error Detection in SNOMED CT Vocabulary
12 pages
Medi 102 E34063
No ratings yet
Medi 102 E34063
12 pages
SNOMED CT to ICD-10 Mapping Guide
No ratings yet
SNOMED CT to ICD-10 Mapping Guide
23 pages
SNOMED CT Search and Data Entry Guide
No ratings yet
SNOMED CT Search and Data Entry Guide
66 pages
Fine-Grained Medical Report Generation
No ratings yet
Fine-Grained Medical Report Generation
12 pages
CSNOtk Release Note v7.5
No ratings yet
CSNOtk Release Note v7.5
2 pages
TF-IDF Vs Word Embeddings For Morbidity Identification in Clinical Notes - An Initial Study
No ratings yet
TF-IDF Vs Word Embeddings For Morbidity Identification in Clinical Notes - An Initial Study
12 pages
SNOMED CT-AU and ICD-10-AM Overview
No ratings yet
SNOMED CT-AU and ICD-10-AM Overview
61 pages
SNOMED CT July 2024 Release Notes
No ratings yet
SNOMED CT July 2024 Release Notes
31 pages
ML for Disease-Treatment Relation Extraction
No ratings yet
ML for Disease-Treatment Relation Extraction
7 pages
Survey of Current Terminologies and Onto
No ratings yet
Survey of Current Terminologies and Onto
12 pages
SnomedDecisionSupport Current-En-US INT 20210311
No ratings yet
SnomedDecisionSupport Current-En-US INT 20210311
133 pages
Decision Support
No ratings yet
Decision Support
133 pages
Healthcare Interoperability Solutions
No ratings yet
Healthcare Interoperability Solutions
8 pages
An Ontology For Intelligent E-Therapy For Obesity: Irene Zaragozá Jaime Guixeres Mariano Alcañiz
No ratings yet
An Ontology For Intelligent E-Therapy For Obesity: Irene Zaragozá Jaime Guixeres Mariano Alcañiz
6 pages
GJMTRI24101
No ratings yet
GJMTRI24101
3 pages
Deshmukh Enhancing Healthcare Communication
No ratings yet
Deshmukh Enhancing Healthcare Communication
6 pages
Sensors 24 02751 v2
No ratings yet
Sensors 24 02751 v2
14 pages
Rethinking Court Decision Prediction
No ratings yet
Rethinking Court Decision Prediction
18 pages
Categorical Data Generator for R
No ratings yet
Categorical Data Generator for R
9 pages
MedNLI Dataset for Clinical Contradictions
No ratings yet
MedNLI Dataset for Clinical Contradictions
16 pages
Legal NLI with LLMs for Social Media Analysis
No ratings yet
Legal NLI with LLMs for Social Media Analysis
8 pages
Conflicts in Regulatory Documents Review
No ratings yet
Conflicts in Regulatory Documents Review
8 pages
Simple SLAM-ASR for Effective ASR
No ratings yet
Simple SLAM-ASR for Effective ASR
11 pages
Noise Management in Machine Learning Data
No ratings yet
Noise Management in Machine Learning Data
9 pages
LA-RAG: Enhancing ASR with Retrieval-Augmented Generation
No ratings yet
LA-RAG: Enhancing ASR with Retrieval-Augmented Generation
5 pages
ASR Error Correction with LLMs
No ratings yet
ASR Error Correction with LLMs
8 pages
Class 12 Semiconductor MCQs with Answers
No ratings yet
Class 12 Semiconductor MCQs with Answers
10 pages
Female Student Directory for Turbat & Gwadar
No ratings yet
Female Student Directory for Turbat & Gwadar
5 pages
Algebra II Regents Exam Questions by Topic
No ratings yet
Algebra II Regents Exam Questions by Topic
96 pages
Primary One Examination Questions
No ratings yet
Primary One Examination Questions
79 pages
Exploratory Research Design
No ratings yet
Exploratory Research Design
21 pages
Papaya Leaf: A Natural Pesticide Solution
No ratings yet
Papaya Leaf: A Natural Pesticide Solution
18 pages
WordPress Installation and Setup Guide
No ratings yet
WordPress Installation and Setup Guide
14 pages
Reducing Sand Drop Defects in Casting
No ratings yet
Reducing Sand Drop Defects in Casting
6 pages
Three Phase Locomotive Technology Overview
No ratings yet
Three Phase Locomotive Technology Overview
336 pages
Essential Vitamins and Their Functions
No ratings yet
Essential Vitamins and Their Functions
29 pages
Aircrack-ng Setup Guide for Linux
No ratings yet
Aircrack-ng Setup Guide for Linux
18 pages
Key Internet Protocols and Their Origins
No ratings yet
Key Internet Protocols and Their Origins
64 pages
Genetics Education and Experience Summary
No ratings yet
Genetics Education and Experience Summary
3 pages
Worship Songs for Spiritual Warfare
No ratings yet
Worship Songs for Spiritual Warfare
8 pages
Inclusion-Exclusion: Selected Exercises
No ratings yet
Inclusion-Exclusion: Selected Exercises
13 pages
疾病与成瘾的开放讨论
No ratings yet
疾病与成瘾的开放讨论
16 pages
High Voltage Laboratory Grounding Design
No ratings yet
High Voltage Laboratory Grounding Design
3 pages
FOMO's Impact on Generation Z Consumption
No ratings yet
FOMO's Impact on Generation Z Consumption
15 pages
Open Jails as Death Penalty Alternatives
No ratings yet
Open Jails as Death Penalty Alternatives
8 pages
Hyundai R35Z-7 Excavator Operation Manual
100% (2)
Hyundai R35Z-7 Excavator Operation Manual
19 pages
Mira
100% (2)
Mira
399 pages
Health Sector Financing in India 2013-16
No ratings yet
Health Sector Financing in India 2013-16
239 pages
Advanced IELTS Vocabulary & Phrases
No ratings yet
Advanced IELTS Vocabulary & Phrases
4 pages
KCSE 2024 English Paper 1 Mock Exam
No ratings yet
KCSE 2024 English Paper 1 Mock Exam
7 pages
Product Strategies in Pharmaceutical Marketing: A Perspective of Pakistani Pharmaceutical Industry
No ratings yet
Product Strategies in Pharmaceutical Marketing: A Perspective of Pakistani Pharmaceutical Industry
9 pages
Psychological Distress in PLWAs in Mzuzu
No ratings yet
Psychological Distress in PLWAs in Mzuzu
98 pages
Rachel Kosnik: Marketing Resume Overview
No ratings yet
Rachel Kosnik: Marketing Resume Overview
1 page
Health Problem Prioritization Guide
No ratings yet
Health Problem Prioritization Guide
3 pages
Understanding Parallelism in Writing
No ratings yet
Understanding Parallelism in Writing
5 pages
2020 Atlas Cross Sport Overview Guide
No ratings yet
2020 Atlas Cross Sport Overview Guide
64 pages

SNOMEDtxt: NLG for Disease Descriptions

Uploaded by

SNOMEDtxt: NLG for Disease Descriptions

Uploaded by

MEDINFO 2019: Health and Wellbeing e-Networks for All 1263

L. Ohno-Machado and B. Séroussi (Eds.)

SNOMEDtxt: Natural Language Generation from SNOMED Ontology

Olga Lyudovyka, Chunhua Wenga

Table 1 – Organizing Relationships

Text Realization Relationships: Disorder of connective tissue (disorder) = Is a

Table 2 – Evaluation with Computed Metrics

For both measures of readability, a lower score indicates a

This work presents an ontology verbalizer for SNOMED CT

You might also like