LO7: Health bioinformatics

Bioinformatics includes the integration of computers, software tools, and databases in an effort to address biological questions. Bioinformatics approaches are often used for major initiatives that generate large data sets. Two important large-scale activities that use bioinformatics are genomics and proteomics. Genomics refers to the analysis of genomes. A genome can be thought of as the complete set of DNA sequences that codes for the hereditary material that is passed on from generation to generation. These DNA sequences include all of the genes (the functional and physical unit of heredity passed from parent to offspring) and transcripts (the RNA copies that are the initial step in decoding the genetic information) included within the genome. Thus, genomics refers to the sequencing and analysis of all of these genomic entities, including genes and transcripts, in an organism. Proteomics, on the other hand, refers to the analysis of the complete set of proteins or proteome. In addition to genomics and proteomics, there are many more areas of biology where bioinformatics is being applied (i.e., metabolomics, transcriptomics). Each of these important areas in bioinformatics aims to understand complex biological systems.

Many scientists today refer to the next wave in bioinformatics as systems biology, an approach to tackle new and complex biological questions. Systems biology involves the integration of genomics, proteomics, and bioinformatics information to create a whole system view of a biological entity.

For instance, how a signaling pathway works in a cell can be addressed through systems biology. The genes involved in the pathway, how they interact, and how modifications change the outcomes downstream, can all be modeled using systems biology. Any system where the information can be represented digitally offers a potential application for bioinformatics. Thus, bioinformatics can be applied from single cells to whole ecosystems. By understanding the complete “parts lists” in a genome, scientists are gaining a better understanding of complex biological systems. Understanding the interactions that occur between all of these parts in a genome or proteome represents the next level of complexity in the system. Through these approaches, bioinformatics has the potential to offer key insights into our understanding and modeling of how specific human diseases or healthy states manifest themselves.

Translational bioinformatics

Translational bioinformatics, a field in the study of health informatics that emerged after the first human genome mapping, focuses on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. The field is evolving at a tremendously fast pace, and many related areas have been proposed. Amongst them, pharmacogenomics is a branch of genomics concerned with individuals’ variations to drug response due to genetic differences. The area is important for designing precision medicine in future. Though a relatively young field, translational bioinformatics has become an important discipline in the era of personalized and precision medicine.

Figure 1. Translational Bioinformatics.

Figure 1. Translational Bioinformatics.

A 2014 review article categorized recent themes in the field of TBI into four major categorizations:

  1. clinical ‘‘big data”, or the use of electronic health record (EHR) data for discovery (genomic and otherwise);
  2. genomics and pharmacogenomics in routine clinical care;
  3. omics for drug discovery and repurposing; and
  4. personal genomic testing, including a number of ethical, legal, and social issues that arise from such services.

The importance of translational bioinformatics may be best understood in the things it is teaching us, things not previously knowable. For example, it is identifying flawed science, improving estimates of relative pathogenicity of human genetic variants, inferring new insights about underlying genetic mechanisms of disease, and identifying promising new drug indications based on curating large volumes of scientific literature. While, sequencing an exome for a clinical diagnosis can be a routine task, the interpretation of the data to make an actual diagnosis or treatment plan is much more complex. Out of the many thousands of variants identified, many of them will have to be evaluated for their clinical utility. At times, for perhaps a simple Mendelian disorder this may be as simple, as only a single variant will need to be identified and considered. But for more complex diseases (e.g. cancers, diabetes, or neurodegenerative diseases) multiple variants will need to be identified. It is only by asking the correct questions about the patient and the disease, along with employing the right computational tools that correct answers can be achieved.

New discoveries, resulting from the Human Genome Project, are now frequently applied to develop improved diagnostics, prognostics, and therapies for complex diseases, which is known as “translational genomics”. In particular, the sequencing cost per genome has markedly reduced over the last decade, according to the data presented by the National Institutes of Health (NIH) Human Genome Research Institute as shown in Figure 2. This further gives rise to new opportunities for personalized treatment and risk stratification.

Figure 2

Figure 2. a) Number of research studies sequencing DNA or genomes (source: PubMed, Web of Science, Scopus, IEEE, ACM). b) Sequencing cost per human-sized genome (source: National Human Genome Research Institute, NHGRI). Total volume of genomic data per year reported by completed studies for c) eukaryotes and d) prokaryotes in 1e2 GB (source: National Center for Biotechnology Information) (Andreu-Perez, Poon, et al. 2015).

On the other hand, research in bioinformatics has broadened from solely sequencing the genome of an individual to also measuring epigenomic data (i.e., above the genome), which include processes that alter gene expression other than changes of primary DNA sequences, such as DNA methylation and histone modifications. Information technologies for acquiring and analyzing biological molecules other than the genome, for example, transcriptome (the total mRNA in a cell or organism), proteome (the set of all expressed proteins in a cell, tissue, or organism), and metabolome (the total quantitative collection of low molecular weight compounds, metabolites, present in a cell or organism that participate in metabolic reactions) are also needed for future advances in the field. To summarize, OMICS aims at collectively characterizing and quantifying groups of biological molecules that translate into the structure, function, and dynamics of an organism. The OMICS profile of each individual should eventually be linked up with phenotypes obtained from clinical observations, medical images, and physiological signals (see Figure 3).

Figure 3. Outline of the “OMICS” approach for studying disease mechanisms.

Figure 3. Outline of the “OMICS” approach for studying disease mechanisms. OMICS aims at collectively characterizing and quantifying groups of biological molecules that translate into the structure, function, and dynamics of an organism. The OMICS profile of each individual, including the genome, transcriptome, proteome, and metabolome, should be eventually linked up with phenotypes obtained from clinical observations, medical images, and physiological signals. Different acquisition technologies are required to collect data at each biological level. Interaction within each level and across different levels as well as with the environment, including nutrition, food, drugs, traditional Chinese medicine, and gut microbiome presents grand challenges in future bioinformatics research.

Figure 4. Practical model for the design and execution of translational

Figure 4. Practical model for the design and execution of translational informatics projects, illustrating major phases and exemplary input or output resources and data sets (Payne et al. 2009).

Genomics in clinical care (Translational Genomics)

While genetics focuses on DNA coding for single functional genes, genomics is the study of the entirety of our DNA, recognizing the crucial regulatory role of non-coding DNA and the complex interactions between multiple genes and the environment. Genomics is fundamental to precision medicine which, through its four components of predictive, preventive, personalized, and participatory medicine, aims to promote wellness as well as to more precisely treat disease. Currently, there is a great amount of genomic discovery research occurring that includes new genomic variants, biomarkers and other basic science discoveries. Thus, many foresee rapid advances in genetic testing and genome sequencing over the next decade, with inevitable implementation into clinical practice.

GPs will play an important role within a genomics medicine service both in supporting patients through diagnostic and treatment processes and in using knowledge of genomics for disease prevention. Also, decreasing costs and increased availability of genetic testing and genome sequencing mean many physicians will consider using these services over the next few years, with some projecting that sequencing will become fully integrated into standard medical care within 10 years.

A tumour’s genomic signature may be used to make a precise diagnosis, enabling more accurate prognosis and better tailored treatment. Examples include Herceptin® (trastuzumab) in breast cancer treatment and BRAF inhibitors in malignant melanoma. Treatment can also be based on germline genomic information; PARP inhibitors are more efficacious in the treatment of ovarian cancer in individuals who carry a BRCA gene mutation.

Although comprehensive genotyping is still relatively recent, it has a high potential for genetic stratification in patient screening, for instance, in the case of factors arising from genotyping, such as high-risk DNA mutations, milk and gluten intolerance, and muscovisciosis. Genetics combined with phenotypic information provided by EHR may help to provide greater insights into low penetrant alleles. For example, it is well known that mutations of fibrillin 1 (FBN1) cause MFS. Nevertheless, the etiology of the disease leads to marked clinical variability of MFS patients of the same family as well as different families. Combining genetic tests of FBN1 and a series of related genes (TGFBR1, TGFBR2, TGFB2, MYH11, MYLK1, SMAD3, and ACTA2) will help to screen out patients who are more likely to develop aortic aneurysms that lead to dissections. Further studies on these high-risk patients based on morphological images of the aorta may provide insight into the rate of disease development.

Another potential area for translational genomics is to study the gene networks of different syndromes of the same person in order to better understand how these syndromes are interrelated. For example, this has been used to study different genes on chromosome 21 (HSA21) and their role in Down’s Syndrome (DS), as well as to understand the underlying reason why nearly half of DS patients exhibit an overprotection against cardiac abnormalities related to the connective tissue. One hypothesis is based on the recent evidence that there is an overall upregulation of FBN1 in DS (which is normally down regulated in MFS). The construction of genetic networks will, therefore, provide a clearer picture of how these syndromes are related. By understanding the gene networks of the related syndromes, it may be possible to provide specific gene therapy for the related diseases.

Another example took place at Stanford’s Lucile Packard Children’s Hospital, where a newborn presented with a condition known as long QT syndrome. In this specific case, the manifestation was unusually severe-the baby’s heart stopped multiple times in the hours after its birth. Long QT syndrome can be caused by mutations in a number of different genes. It is necessary to know which gene harbors the mutation in order to know how to treat the condition. In this case, a whole-genome sequencing (WGS) was performed enabling identification of a previously-studied mutation, as well as a novel copy number variation in the TTN gene that would not otherwise have been detectable through targeted genotyping alone. Moreover, NGS enabled the answer to be obtained in a matter of hours to days instead of weeks.


Pharmacogenomics can be defined as the study of how genetic factors affect a person’s response to drugs. This relatively new field combines pharmacology (the science of drugs) and genomics (the study of genes and their functions) to develop effective, safe medications and doses that will be tailored to a person’s genetic makeup.

Many drugs that are currently available are “one size fits all,” but they don't work the same way for everyone. It can be difficult to predict who will benefit from a medication, who will not respond at all, and who will experience negative side effects (called adverse drug reactions). Adverse drug reactions are a significant cause of hospitalizations and deaths. Once a patient takes a drug, the drug must travel through the body to its target(s), act on its target(s), and then leave the body. The first and last of these processes is facilitated by pharmacokinetic (PK) genes, which may affect a drug in the ‘‘ADME’’ processes: to be absorbed into and distributed through the body, metabolized (either to an active form or broken down into an inactive form), and excreted. With the knowledge gained from the Human Genome Project, researchers are learning how inherited differences in genes affect the body’s response to medications. These genetic differences will be used to predict whether a medication will be effective for a particular person and to help prevent adverse drug reactions.

Pharmacogenomics focuses on the identification of genome variants that influence drug effects, typically via alterations in a drug’s pharmacokinetics or via modulation of a drug’s pharmacodynamics (e.g., modifying a drug’s target or perturbing biological pathways that alter sensitivity to the drug’s pharmacological effects). For diseases other than cancer and infectious diseases, the genome variations of interest are primarily in the germline DNA, either inherited from parents or de novo germline sequence changes that alter the function of gene products. In cancer, both inherited genome variations and somatically acquired genome variants can influence response to anticancer agents.

Whole genome sequencing by NGS is important to the study of complex diseases such as cancer. It has been a long-standing problem in cancer treatment that drugs often have heterogeneous treatment responses even for the same type of cancer, and some drugs only show profound sensitivity in a small number of patients. Currently, large-scale personal genomics and pharmacogenomics datasets have been generated to uncover unique signaling patterns of individual patients and discover drugs that target these unique patterns. These include cancer cell line databases of nonspecific cancer cell types or a specific cancer cell type such as breast cancer. The Cancer Genome Atlas Project of the NIH has tested the personal genomic profiles of over 10000 individuals across over 20 types of cancer and uncovered new cancer subtypes based on those profiles. Patients with distinct genomics aberrations are believed to be responsible for the variability of drug response. Large-scale datasets as such can be used to enable drug repositioning, predict drug combinations, and delineate mechanisms of action. They are becoming an important component in drug development. It is, therefore, possible to design precision medicine for individual patients based on their genomics profiles.

Pharmacogenomics has gone beyond studying individuals’ drug response based on genome characteristics (e.g., copy number variations and somatic mutations) and now incorporates additional transcriptomic and metabolic features such as gene expression, considering factors that influence the concentration of a drug reaching its targets and factors associated with the drug targets. Since the gene expression profiles of cell lines are known to vary considerably in the process of prolonged culture under different culture conditions and techniques, the use of gene expression from cell lines for prediction of drug response in the patient is currently controversial. A recent algorithm for predicting in vivo drug response with the patient’s baseline gene expression profile achieved 60%– 80% predictive accuracy for different cases. Other research studied drug response using immunodeficient mice xenografted with human tumors, which have the advantage of potentially studying both genetic and nongenetic factors that affect cancer growth and therapy tolerance.

The field of pharmacogenomics is still in its infancy. Its use is currently quite limited, but new approaches are under study in clinical trials. In the future, pharmacogenomics will allow the development of tailored drugs to treat a wide range of health problems, including cardiovascular disease, Alzheimer disease, cancer, HIV/AIDS, and asthma.

Omics for drugs discovery and repurposing

The cost of generating new therapeutics has risen dramatically over the past 60 years, with each new drug costing about 80-fold more in 2010 than 1960 in inflation-adjusted terms. Also, much has been said about the protracted process involved in getting a drug through the FDA approval pipeline. Estimates are that the process can take on average 12 years between lead identification and FDA approval. As a result, many are investigating high-throughput and computational approaches to drug discovery and repurposing. Recent efforts have focused on the use of the omics data, especially genomics, to discover new drug targets and search for new uses for existing drugs, referred to as drug repositioning.

Pharmacogenomics can impact how the pharmaceutical industry develops drugs, as early as the drug discovery process itself (Figure 5). First, cheminformatics and pathway analysis can aid in the discovery of suitable gene targets, followed by small molecules as ‘‘leads’’ for potential drugs. Additionally, discovery of pharmacogenomic variants for the design of clinical trials can allow for safer, more successful passage of drugs through the pharmaceutical pipeline. As mentioned previously, cheminformatics methods can be used to identify novel drug-protein interactions. While these predicted interactions can be used to discover new small molecules for therapeutic purposes, any new drug must still go through the significant regulatory hurdles of safety and efficacy testing.

Figure 1. Translational Bioinformatics.

Figure 5. Drug discovery. Pharmacogenomics can be used at multiple steps along the drug discovery pipeline to minimize costs, as well as increase throughput and safety. First, association and expression methods can be used to identify potential gene targets for a given disease. Cheminformatics can then be used to narrow the number of targets to be tested biochemically, as well as identifying potential polypharmacological factors that could contribute to adverse events. After initials, pharmacogenomics can identify variants that may potentially affect dosing and efficacy. This information can then be used in designing a larger Phase III clinical trial, excluding ‘‘non-responding’’ and targeting the drug towards those more likely to respond favorably.

In addition to the Human Genome Project, several large-scale biological databases launched recently will further facilitate the study of disease mechanisms and progressions, particularly at the system level as outlined in Figure 18. The Research Collaboratory for Structural Bioinformatics Protein Data Bank is a worldwide archive of structural data of biological macromolecules, providing access to the 3-D structures of biological macromolecules, as well as integration with external biological resources, such as gene and drug databases. ProteomicsDB is another example, encompassing mass spectrometry of the human proteome acquired from human tissues, cell lines, and body fluid to facilitate the identification of organ-specific proteins and translated long intergenic noncoding RNAs, with due consideration of time-dependent expression patterns of proteins.

Parallel to these developments, the Human Metabolome Database consists of more than 40000 annotated metabolites entries in the latest version released in 2013. It provides both experimental metabolite concentration data and analyses through mass spectrometry and Nuclear Magnetic Resonance (NMR) spectrometry. Databases as such are believed to greatly facilitate the translation of information into knowledge for transforming clinical practice, particularly for metabolic-related diseases, such as diabetes and coronary artery diseases. In fact, metabolomics has emerged as an important research area that does not only include endogenous metabolites of the human body but also chemical and biochemical molecules that can interact with the human body. Specifically, ongoing efforts have been placed for fingerprinting metabolites from food and nutrition products, drugs, and traditional Chinese medicine, as well as molecules produced by the gut bacterial microbiota. These will eventually help us to better understand the interaction between the host, pathogen and environment.

The availability of the genomic, proteomic, and metabolic databases allows a better understanding of the development of complex diseases such as cancer. They also allow the search of new biomarkers using different pattern mining and clustering techniques. The clusters can be either partitional (hard) or hierarchical (tree-like nested structure). Using multicore CPU, GPU, and field-programmable gate arrays with parallel processing techniques can further accelerate these methods.

In two linked papers, Dudley et al. and Sirota et al. created disease signatures from microarray data in Gene Expression Omnibus and compared these to gene expression data from Connectivity Map to identify potentially novel therapeutics for lung cancer and inflammatory bowel disease. A similar study using this method, noted that tricyclic antidepressants might have efficacy against small cell lung cancer (but not non-small cell lung cancer).

Drug repurposing refers to taking an existing, already on the market, FDA-approved compound and using it to treat a disease or condition other than the one for which it was originally intended. In the past, inspiration for this type of ‘‘off label use” has been largely serendipitous. For example, Viagra was initially aimed at treating heart disease, and turned out to be useful for erectile dysfunction. By using a pre-approved compound, early phase clinical trials can be avoided, which can save significant time and money.

Disease-gene association data may also predict drug targets. Sanseau et al. evaluated existing GWAS hits and found that genes related to GWAS hits are significantly more likely to be targetable by small molecules or by biologic agents than other genomic regions, and that 15.6% of GWAS genes are existing drug targets (compared to 5.7% of the general genome). In support of this hypothesis, Okada et al. performed a multi-ethnic GWAS of 103,638 cases and controls for rheumatoid arthritis (RA) and noted 101 total RA risk loci; these loci identified 18 of 27 current RA drug target genes, and identified three approved cancer medications that may be active against RA. Khatri et al. analyzed eight existing organ transplant rejection datasets and found a common module of 11 genes overexpressed in all rejected organs. Using these genes, they identified two existing non-immunosuppressant drugs that could be repurposed to regulate these genes and demonstrated enhanced effect in a mouse model. Resources such as the drug-gene interaction database (DGI), which integrates data from 13 databases, and PharmGKB may facilitate translation of genomic study results to potential therapeutics. See the Table below for a listing of TBI resources.

Finally, an increasing collection of available computational and experimental methods that leverage molecular and clinical data enable diverse drug repositioning strategies. Integration of translational bioinformatics resources, statistical methods, chemoinformatics tools and experimental techniques (including medicinal chemistry techniques) can enable the rapid application of drug repositioning on an increasingly broad scale. Efficient tools are now available for systematic drug-repositioning methods using large repositories of compounds with biological activities. Medicinal chemists along with other translational researchers can play a key role in various aspects of drug repositioning.

Table 1. Public resources available for Translational Bioinformatics.

Name URL Comments
Pharmacogenomic Biomarkers in Drug Labels http://www.fda.gov/drugs/ scienceresearch/researchareas/ pharmacogenetics/ucm083378.htm Lists FDA-approved drugs with pharmacogenomic information in their drug labels.
PharmGKB http://www.pharmgkb.org PharmGKB is a curated resource about the impact of genetic variation on drug response for clinicians and researchers.
Clinical Pharmacogenetics Implementation Consortium (CPIC) http://www.pharmgkb.org/page/cpic Provides a list of the published guidelines for drug-gene interactions produced by CPIC.
Phenotype Knowledgebase http://phekb.org Online collaborative repository for building, validating, and sharing electronic phenotype algorithms and their performance characteristics.
NHGRI Catalog of GWAS studies http://www.genome.gov/26525384 Curated list of GWAS studies, their phenotypes, and key results.
Catalog of PheWAS results http://phewascatalog.org Searchable, downloadable catalog of EHR PheWAS results.
Drug-Gene Interaction database http://dgidb.genome.wustl.edu Provides a search interface into drug-gene interactions from data derived from 13 resources.
My Cancer Genome http://www.mycancergenome.org Provides up-to-date data regarding cancer mutations, treatments, and relevant clinical trials.
ClinVar http://www.ncbi.nlm.nih.gov/clinvar/ It provides up-do-date relationships among human variations and phenotypes along with supporting evidence.
SHARPn http://phenotypeportal.org Collection of computable phenotype algorithms generated by SHARPn.

Personalized genomic testing

Personalized medicine has become important as a means to help patients receive the best possible outcomes while reducing adverse effects and high direct medical costs if a treatment will not benefit the patient.

Genetic and genomic tests each have a place in personalized medicine. Genetic tests typically focus on a specific, known gene, while genomic tests, whole-genome sequencing (WGS), focus on expression and interaction of groups of genes. Genetic tests concentrate on the presence or absence of mutations, or overexpression, of individual genes, while genomic tests provide gene signature profiles based on expression levels of specific component genes. Examples of genetic tests include BRCA-1 and -2 in breast cancer, EGFR in non-small cell lung cancer, and BRAF in melanoma. Examples of genomic tests include the Oncotype DX assays in breast, colon, and prostate cancers, and the 70-gene assay in breast cancer. Since WGS was first developed, advances in technology have made the test easier, quicker, and less expensive. So easy, in fact, that it could become a routine test offered to healthy patients during primary care visits. However, it can be difficult to determine what the results of WGS mean.

What is genetic testing?

Genetic testing is the analysis of human DNA, RNA, or proteins to detect gene variants, changes in chromosomes, or proteins associated with certain diseases or conditions; non-diagnostic uses include paternity testing and forensics. The results of a genetic test can confirm or rule out a suspected genetic condition or help determine a person’s chance of developing or passing on a genetic disorder. More than 1,000 genetic tests are currently in use, and more are being developed.

Genetic testing methodology varies:

  • Molecular genetic tests study single genes or short lengths of DNA to identify variations or mutations that lead to a genetic disorder.

  • Chromosomal genetic tests analyze whole chromosomes or long lengths of DNA to detect large genetic changes such as an extra copy of a chromosome.

  • Finally, biochemical genetic tests study the amount or activity level of proteins; abnormalities in either can indicate changes to the DNA that result in a genetic disorder.

The Figure 6 summarizes the various applications of genetic testing available today. Genetic testing is voluntary, and it has benefits as well as limitations and risks. Thus, the decision about whether to be tested is a personal and complex one. A geneticist or genetic counselor can help by providing information about the pros and cons of the test and discussing the social and emotional aspects of testing.

The last decade has seen an unprecedented pace of advancement in our ability to sequence the genome. As the cost of sequencing decreases, the opportunity to move from targeted sequencing to whole exome sequencing (the analysis of all a person’s genes) and then to whole genome sequencing that analyzes a person’s entire genetic code becomes more accessible, particularly for researcher.

Figure 1. Translational Bioinformatics.

Figure 6. Available types of genetic testing.

Most medical genetic test results will directly change your medical care and those changes are based on evidence gathered through clinical trials and other medical practice. Medical genetic tests may be used to:

  • Diagnose a genetic disease.
  • Assess the chance of having a child with certain genetic conditions.
  • Predict if a person may be more likely to have side effects or an abnormal response to a certain drug.
  • Find an increased risk for a common disease.

For genomic assays to be a viable tool, they must be accurate and clinically meaningful. As below Table shows, genomic assays need to have analytic validity, clinical validity, and clinical utility. The analytic validity is the test’s ability to accurately and reliably measure the genotype (or analyte) of interest in the clinical laboratory and in specimens representative of the population of interest. Regarding clinical validation, a major goal is to identify and quantify potential sources of biologic variation in the analysis of a given sample. Clinical utility is a test’s ability to benefit patients by improving treatment decisions.

Table 2. Evidence Requirements for Genomic Assays:
Analytical validity: Ability to accurately and reproducibly measure analyte (or genotype). Does it detect what it is supposed to detect?
Clinical validity: Ability to accurately and reliably predict phenotype, clinical disease, or predisposition to disease. Does it detect information that is known to be associated with a specific disease?
Clinical utility: Evidence that guides patient management and affects decision making, resulting in added value and improved outcomes. How useful is the information to improve health outcomes?

The rapid evolution of genomic sequencing technologies has decreased the cost of genetic analysis to the extent that it seems plausible that genome-scale sequencing could have widespread availability in health care across all stages of life - from preconception to adult medicine (Figure 7). Challenges to fully embracing genomics in a clinical setting remain, but some approaches are starting to overcome these barriers, such as community-driven data sharing to improve the accuracy and efficiency of applying genomics to patient care.

Early analyses comparing genomes of different individuals confirmed the remarkable similarities of sequence (99% identical), but soon gave way to expectations that the millions of nucleotide differences among different individuals would enable clinicians to not only recognize each individual’s biologic uniqueness, but to translate this knowledge into more precise understanding of physiology, more refined diagnoses, better disease risk assessment, earlier detection and monitoring, and tailored treatments to the individual patient; ie, personalized (or individualized or precision) medicine.

Figure 7. The use of genomics throughout an individual’s lifespan

Figure 7. The use of genomics throughout an individual’s lifespan. Case studies of the use of genomics to inform patient care at different stages of life. (Rehm 2017)

Value of genomics in personalized medicine

Despite the use of DNA diagnostic testing prior to 2000, it has been the exponential increase in our capacity to perform nucleotide sequencing that has been largely responsible for the relatively recent emphasis on personalized medicine. Completion of the HapMap project allowed for selection of genome wide single nucleotide variants (SNVs) that would tag common variants throughout the genome. This enabled genome-wide association studies (GWASs) for discovery of loci associated with clinical phenotypes. Advances in next-generation sequencing (NGS) have reduced the cost and time required for whole exome sequencing (WES) or whole genome sequencing (WGS), and we are continually improving our capacity for handling the storage, transfer, and analyses of huge amounts of sequence data. Also, have enabled millions of people to have their individual genomic sequence analysed, primarily within the settings of research studies or clinical care. There is widespread recognition that access to an individual’s genomic sequence and other ‘omics’ data can enable a more detailed understanding of our health and disease risks, and inform a more precise approach to patient care, a strategy now commonly called ‘precision medicine’.

With genomic data now increasingly used to guide the individual care of patients, our health care systems are evolving, although several challenges remain. This Perspective considers how genomics is guiding health care for the individual, providing illustrative examples of how individuals are taking advantage of personal genomic information, ranging from advanced diagnostics and tumor profiling to genomic risk assessments. These examples are then interweaved
with the day-to-day challenges still facing the integration of genomics into clinical practice as well as with strategies that are being developed to overcome these barriers and enable genomics to be a part of ever more aspects of everyday patient care.

In 2008 saw the founding of several companies that offered direct-to-consumer (DTC) genetic testing, reporting on a variety of genes for both health and recreational purposes. Direct-To-Consumer (DTC) genetic testing through sites such as 23andMe (Mountain View, CA) has provided an avenue for patients to pursue genetic testing outside of a doctor’s order. Individuals received test results and personalized information on their genetic ancestry, disease risk, and drug response for selected medications.

DTC genetic testing raises a number of interesting ethical, legal, and social issues. For several years, there was an open question as to whether or not these tests should be subject to government regulation. In November 2013, the US FDA ordered 23andMe to stop advertising and offering their health-related information services. The FDA considered these tests to be ‘‘medical devices” and as such to require formal testing and FDA approval for each test. In February 2015, it was announced that the FDA had approved 23andMe’s application for a test for Bloom syndrome (http://www.fda.gov/News Events/Newsroom/PressAnnouncements/UCM435003), and in October 2015 it was announced that the company would once again be offering health information in the form of carrier status for 36 genes. Note that a 23andMe customer is able to download his or her raw genomic data and to use information from other websites to interpret the results, including Promethease, Geneticgenie, openSNP, and Interpretome for health-related associations.

A more positive example of where genetic testing is helping patients is a case presented at the American Neurological Association conference in 2014. A patient had a history of Alzheimer’s disease on her mother’s side of the family. She did not know if she was a carrier, nor did she want to know. But she wanted to ensure that she did not pass that mutation to her future children. Preimplantation genetic diagnosis (PGD) testing enabled her doctors to select embryos that did not have that Alzheimer’s disease gene mutation. The patient herself was never tested, nor was she informed how many (if any) of the embryos contained the mutation.

Table 3. Examples of personal genetic profiling tests for disease susceptibility.

Company Example product Details
23andMe Health Edition “Find out if you carry inheritable markers for diseases such as breast cancer, cystic fibrosis, and Tay-Sachs...Learn your genetic risk for type 2 diabetes, Parkinson's disease, and other conditions.
deCODEme Complete Scan “Calculate your genetic risk for 51 conditions...”
Genetic Health Premium Male “These are our most comprehensive test and includes all the other tests in our range... Evaluates the risk of prostate cancer as well as the risk for thrombosis, osteoporosis, metabolic imbalances of detoxification and chronic inflammation. It also evaluates the risk profile of the most common cardiovascular diseases...”
Graceful Earth Alzheimer’s genome test “Check your future susceptibility BEFORE symptoms occur... Pre-emptive insight into one's genetic predisposition can empower and allow for pro-active prevention.”
Navigenics Health Compass “Knowing your genetic predispositions for important health conditions and medication reactions can help motivate you to take steps towards a healthier life. By gaining insight into these risks, you can plan for what's important.”

Also, Universal newborn screening (NBS)
is an extraordinarily successful
public health program, preventing morbidity and mortality through
early diagnosis and management of conditions including rare inborn errors
of metabolism. Conditions such as phenylketonuria are not clinically evident at birth but lead to significant irreversible harm or death if not treated promptly. NBS has saved countless lives and vastly improved the quality of children’s lives by allowing timely therapeutic interventions, and technological advances such as the use of tandem mass spectrometry (MS/MS) have played a significant role in expansion of NBS. The capacity of genome-scale sequencing for disease gene discovery is increasingly being applied as a diagnostic test in children with suspected monogenic disorders.

The ability to analyze many or all genes in the genome simultaneously provides new opportunities for genomic medicine. The clinical utility of sequencing is recognized for certain diseases and in an increasing number of medical specialties, with genetic and genomic medicine offering the promise of improved diagnostics and treatments – and patients asking physicians about the applicability of these technologies for their own care. However, some experts caution the roadmap for translating genetics and genomics into routine clinical practice is unclear.

Computational health informatics

Computational health informatics (CHI) is an emerging research topic within and beyond the medical industry. It is a multidisciplinary field involving various sciences such as biomedical, medical, nursing, information technology, computer science, and statistics. CHI is a computer science branch that addresses how computational methods relate to providing health care. Using Information and Communication Technologies (ICTs), health informatics collects and analyzes the information from all healthcare domains to predict patients’ health status. The major goal of health informatics research is to improve the quality of care provided to patients or Health Care Output (HCO). The healthcare industry has experienced rapid growth of medical and healthcare data in recent years. Figure 8 depicts the growth of both healthcare data and digital healthcare data. It is projected that the healthcare data analytics market will increase and grow 8-10 times as fast as the overall economy until 2017.

Figure 8. Healthcare data growth. (Fang et al. 2016)

Figure 8. Healthcare data growth. (Fang et al. 2016)

The rapid growth of new technologies has led to a significant increase of digital health data in recent years. More medical discoveries and new technologies such as novel sensors, mobile apps, capturing devices, wearable technology have contributed to additional data sources. Therefore, the healthcare industry produces a huge amount of digital data by utilizing information from all sources of healthcare data such as Electronic Health Records (EHR, including electronic medical records) and personal health records (PHR, one subset of EHR including medical history, laboratory results, and medications). Based on reports, digital healthcare data from all over the world was estimated to be equal to 500 petabytes (1015) in 2012 and it is expected to reach 25 exabytes in 2020 as shown in Figure 23b.

The digital health data is not only enormous in amount, but also complex in its structure for traditional software and hardware. Some of the contributing factors to the failure of traditional systems in handling these datasets include:

  • The vast variety of structured and unstructured data such as medical records, hand-written doctor notes, medical diagnostic images (MRI, CT), and radiographic films.
  • Existence of noisy, heterogeneous, complex, diverse, longitudinal, and large datasets in healthcare informatics.
  • Difficulties to capture, store, analyze and visualize such large and complex datasets.
  • Necessity of increasing the storage capacity, computation power and the processing power.
  • Improving the quality of care, security of patients’ data, sharing, and the reduction of the healthcare cost.

Hence, solutions are needed in order to manage and analyze such complex, diverse and huge datasets in a reasonable time complexity and storage capacity. Big data analytics, a popular term given to datasets which are large and complex, play a vital role in managing the huge healthcare data and improving the quality of healthcare offered to patients. In addition, it promises a bright prospect for decreasing the cost of care, improving treatments, reaching more personalized medicine, and helping doctors and physicians to make personalized decisions.

Finally, the major benefits of big data analytics in healthcare are as follow:

  1. It makes use of the huge volume of data and provides timely and effective treatment to patients.
  2. It provides personalized care to patients.
  3. It will benefit all the components of a medical system (i.e., provider, payer, patient, and management).




The European Commission support for the production of this publication does not constitute endorsement of the contents which reflects the views only of the authors, and the Commission cannot be held responsi-ble for any use which may be made of the information contained therein.