LO8: Bioinformatics in food production and engineering


Food and nutrition have an important role in regulation of human body processes. The introduction of advanced techniques like “omics” in food science and practice causes serious difficulties in interpretation of accumulated great biological data sources. A decision of this problem is implementation of bioinformatics approach giving an excellent ground for successful development of food production and engineering.

Food acts as important regulating factor on different processes within the body, like metabolic, mental etc. Tentative growth of various chronic disease is also linked with food. Considerable endeavor is ensured to prompt and improve the nutritional potential and quality of food sources. Recently, food science has grown notably applying various smart techniques like “omics” series. In order to overcome the vast variety of data and difficulties in their interpretation a database is necessary. It can store and keep updating the comprehensive amount of biological data and resources, important for food and nutritional sciences. Thus, the development of bioinformatics in food will help in providing the simple and convenient ways for improving the food research and technologies.

Bioinformatics benefits the food production and nutrition

Bioinformatics strongly depends upon tuneful software solutions, disposable through electronic telecommunications to the individual scientist. The massive computing power of the modern computer systems is facing less and less limitations in storage of space and calculation time. Thus, the only limiting factor is the lack of information on specific topics. Since industrial food processes are based on food-grade organisms like bacteria, molds and yeasts, the advance in the number of complete genomic sequences of organisms leads to rapid increase in valuable knowledge to compensate this lack. This knowledge can be used in many different fields like metabolic engineering, cell performance as a micro-process factory and elaboration of new methods for preservation. Moreover, genomic knowledge food-grade microorganisms will innovate pre- and probiotic research in order to describe the broad range of bacterial properties from growth to stress responses, to multi-species microbial ecology within the human host.

Applied bioinformatics in nutrition food research: usage and examples

In order to realize the mechanisms of nutrients action, the investigators need to use a reductionist strategy. It poses the problem to the level of cells, proteins, genes, etc. Then, the knowledge gained is transferred to the level of human body to evaluate the nutrient effects. In this way, nutrition researchers regularly generate and interpret data at the molecular level. The serious and predictive understanding of metabolism needs nutrients and metabolites to be studied in the context of their associated regulatory mechanisms. For example, the peroxisome-proliferator activated receptors (PPARs) represent complex of molecules that directly link nutrient intake to organism response. PPARs are transcription factors that sense different metabolites, like fatty acids and their derivatives at the cellular level. After that, a launching of specific metabolic program by regulating the expression of a variety of target genes happens. As an answer to this complete mechanistic understanding of PPARs, a recent bioinformatics study was performed to predict PPAR gene targets on a genome-wide basis. In fact, this study gave the first library of nutrient-sensitive genes and showed for the first time how databases and software can be integrated to investigate nutritionally relevant logical questions.

It answers to the following questions:

  1. Which genes are directly regulated by PPARs and, thus, by fatty acids and fatty acid derivatives?
  1. What are the biological functions of these fatty acid responsive genes?
  2. What other transcription factors regulate these fatty acid-responsive genes

A simplified flow-chart in Fig. 1 illustrates how databases and software were integrated to answer these questions.

The diagram in Fig 1 illustrates the basic steps in predicting the regulatory effect of PPARs on gene expression. They can be summarized as follows:

  • Search of literature in the PubMed database for manuscripts containing experimental evidence for DNA binding sites of PPARs;
  • Use of these sites to build probability matrices with different probabilistic assumptions with the use of the CONSENSUS and GMMPS programs;
  • Extract relevant genomic information (all known human genes, DNA regions upstream from their transcription start site, conserved elements within these upstream regions, and homologous genes in the mouse and rat genomes) using some custom programs;
  • Scoring the probability matrices against DNA sequence upstream from known PPAR target genes and randomly selected genes in the genome using custom program and software;
  • Application of techniques that minimized the number of false-negative and false-positive results in the detection of PPAR binding sites and identification of putative PPAR target genes on a genome-wide basis
  • Analyzing the sets of genes (PPAR targets) by using a gene ontology analysis tool, along with custom software to determine the biological functions represented by each group.

Figure 1. Integration of databases and software to predict genes regulated by Peroxisome-Proliferator (according to Lemay et al., Am J Clin Nutr 2007;86:1261–9)

Figure 1. Integration of databases and software to predict genes regulated by Peroxisome-Proliferator (according to Lemay et al., Am J Clin Nutr 2007;86:1261–9)

Bioinformatics in reconstruction of metabolic pathways

Microbial metabolism has been the ground of a major part of food processing for centuries. Fermentation of food takes advantage of the ability of desirable microbes to convert substrates (usually carbohydrates) to organic tailor-made compounds contributing to the flavor, structure, texture, stability and safety of the food product.

Due to its fundamental importance to a wide variety of foods: breads, cheeses, wines, sausages etc., over a century of research has focused on understanding microbial metabolism. The potential to transform this knowledge into even greater value in foods has been dramatically expanded by the availability of tools to understand and control microbial metabolism using modern genomic and bioinformatics approaches. In fact, the tremendous information flow on microbial metabolism is only being converted into usable knowledge because of the arrival of the massive computing power and the bioinformatics’ tools that are apply to large data sets generated by nutrition-related research.

This knowledge will not only drive a new generation of foods with additional values but also will change dramatically the ability of foods to influence individual quality of life.

Application of gene expression arrays

The ability of the nutrients to control directly the expression of specific genes is at the core of a new generation of nutritional science, which gives opportunity of researchers to use genomic information to develop technologies, able to measure the number of transcribing genes in any cell at any time (i.e. gene expression arrays). In this way, scientists are finding the intimate relationships between organisms and their environment.

Studies on the integrative metabolism of animals and humans are associated with food and nutrition as a multidisciplinary field center. Currently, the apparent strong relationship between diet and health is finding its mechanistic basis through understanding the interaction of nutrients with metabolic pathways. Since most nutrients affect a wide range of biochemical pathways, the food exerts multiple effects: pleiotropic dysfunctions in the relative absence of define nutrient, i.e. deficiencies, and pleiotropic benefits when they return to appropriate, optimal levels.

The classical biochemical approaches describe very well the effects of a single nutrient on a single target; however, the multiplicity of metabolic effects on the entire organism is difficult to be explained. The modern genomics uses the reverse approach: it measures everything. Genomic-based investigations reveal the pleiotropic behavior of exogenous nutrients through describing the full spectrum of transcriptional responses to any variable, including nutrients. These global experimental designs are possible due to the ability of bioinformatics tools to adequately manage and analyze the vast volume of accumulated data.

Genetic variability

After the sequencing of human genome the mapping of its polymorphic regions that control individual phenotypic differences among the population are going on. The established by this approach variations were thought at the beginning only as the key to the discovery of genetic diseases. However, it is known now that they are also the keys to individual variation in diet and health. Sequence variation in particular gene (even in particular nucleotide, the so called Single Nucleotide Polymorphism - SNP) can influence the quantitative need for and physiological response to various nutrients. There are examples of polymorphism that influence nutrition and disease: the phenylketonuria, in which the inability to metabolize phenylalanine renders this nutrient toxic; the lactose intolerance due to polymorphism both in the structure of the lactase gene, which produce dysfunctional enzyme and in regulatory regions of the genome that prevent perfectly functional lactase enzyme from being produced in adults.

With genomics will come the knowledge of predicting health. The potential of bioinformatics to deliver knowledge about the integrative nature of multiple genes to the individual consumer will help in predicting its health leading to individualized dietary choices. This will be possible in close future due to the bioinformatics tools, capable of managing the volume of data implied by quantitatively assessing individual metabolism and intervening in an that individual’s metabolism using foods to improve their health.

Genomic and bioinformatics tools will improve human nutrition trials. During their performance, it is not easy to find statistically significant positive effects of various nutrients and food because the magnitude of the benefit is quite small relative to the overall variability in a sample of humans chosen at random from the population and because humans do not respond homogeneously to even the most straightforward nutritional variables.

To overcome this obstacle, clinical and epidemiological trials are now being analyzed using SNP data as independent input variables. Most clinical trials build catalogues of SNPs of genes whose variation in function have shown to be important for manifestation of example cancer, autoimmunity and heart disease. Such approach has been successful not only in identifying the causes of statistical variation among individuals but also in identifying the potential biochemical mechanisms responsible for the variation in response.

Genetic polymorphism and nutrient requirements

Polymorphisms in the various genes encoding enzymes, transporter proteins and regulatory proteins affect the absolute quantities of essential nutrients (incl. vitamins, minerals, etc.) that are needed to satisfy the cell requirements for sufficiency. Consequently, the variation in the population’s nutrient status is a complex value. It is a result of variations in food intakes plus inherent variations amongst individuals within the population in their genetically defined abilities to absorb, metabolize and utilize these nutrients. The figures for the recommended daily allowances of each nutrient are shaped on the basis of experimentally determined data for the needs of a statistically representative segment of the population. However, the range of responses to both micro-and macronutrients in the population as a whole is much larger. Specifically individual food choices, genetics and nutrition are linked in s complex way that was highlighted quite recently with the help of genomic tools. Thus, polymorphism in a recently identified sweet receptor protein has been proposed to be the basis for the varying intakes of caloric-rich foods, i.e. the famous sweet tooth.

Based on the information genomics succeeds to reveal for food preference and the corresponding roles of genetics and environment, the food science in now able to make nutritional superior foods that are more attractive (organolepticall) to that subset of the population for whom they are most appropriate. However, now the technologies to describe the effects of diet on individuals experimentally are used at broad basis only in clinical trials. They are not included yet in the routine consumer assessment. Therefore, consumers cannot benefit from nutritional knowledge about themselves, because they simply do not have it. This lack of knowledge is the most important factor that influences negatively the widespread improvement in nutritional health in the consumer population.

Genetic variation and the response to variations in overall diet

The basic metabolism of macronutrients, especially of carbohydrates and fats in humans is strongly affected by genetic differences. For instance, polymorphisms in the apo-protein genes (apoE, apoAIV) or lipoprotein catalysts (lipoprotein lipase) have been shown to directly affect the clearance of dietary lipids. That is why polymorphisms in lipid metabolic genes command the response of the individuals to dietary fat in a different way. apoE protein clears liver-derived lipoproteins (VLDL and LDL) from blood. This functionality of the protein is influenced by the polymorphism in the genes encoding for it. In addition. health outcomes beyond heart disease including Alzheimer’s disease have been shown to be correlated to apoE phenotypes. Apparently, diet plays a differential role in the development of these diseases according to genotype through the role of diet in influencing the quantitative flux of hepatic lipoprotein metabolism.

Many consumers consider the application of genomic testing in the population as useless or inappropriate. This is because they do not see any direct benefit for themselves. Nevertheless, acquiring knowledge about individual variation in diet-responsive genes is of great values, since this knowledge can be used for successful intervention. There are evidence that genotype predicts a difference in postprandial lipid metabolism of dietary fat. The translation of this discovery into practical recommendations how to alter the intakes of dietary fat for those affected is of great practical value. Thus, the information of how an individual responds to foods provides that individual with the means to change their diet to improve their health. Practically, each new discovery of genetic polymorphisms linked to health, is making the complexity of the science bigger. However, thanks to modern bioinformatics tools that are integrative by nature, each new discovery is added to the rapidly expanding coherent database of diet and health of individual consumers.

Bioinformatics approaches refine the food production

Biomass and metabolites yields

Optimization of biomass yield is by a topic of continuous attention in respect to improvement of the food production process. The genome-scale metabolic modelling is a technique applied to rationally improve fermentation yield. Within this technique, the genome sequence of the organism is used as a catalogue of the metabolic potential of a given strain. Using this technique, metabolic models have been made for many microorganisms, including several food-grade microbes. A limiting factor in the correctness of the metabolic models can be the quality of the genome sequence. For instance, a gene can be missed due to poor sequencing coverage. However, the metabolic model can be finalized by identifying those metabolic reactions that are missing in the model, but are likely to present because they are part of metabolic reaction cascade or pathway. The full genome-scale metabolic models allow the in silico simulation of growth of the organism under the (metabolic) restrictions provided by the substrate availability in the medium. These simulations can be used to optimize medium composition to better fit the organism requirements. Moreover, the models can suggest alternative or cheaper substrates for fermentation, and improve the production of essential compounds, taking into account possible changes in activity with respect to flavour or texture activity of the strain. These models have also been implemented in complex (multistrain) fermentation processes, providing insight in the interactions between different species/strains in a complex fermentation.

A second factor that improves the overall yield is the robustness of the strains. This factor can be influenced largely by changing fermentation conditions under which starter cultures are prepared. For example, in L. lactis a number of genes that were potentially causative related to survival were identified by correlating the levels of gene expression to the survival of the species. The importance of these genes for the strains’ phenotype was further proven by gene-disruption technique. It showed that not only gene itself but also its expression is important for a given phenotype. In other words, preconditioning L. lactis strains, followed by GTM and TTM, allows improving their survival to heat and oxidative stresses.

Texture and flavour performance

The fermentation process influences as well such important characteristics like the texture and the flavour of the food products. Since these traits are microorganism-specific, they can be altered by fermentation. For instance, addition of adjunct strains to cheese fermentation can change the product flavour or addition of exopolysaccharide-producing organisms can improve the texture of yoghurt. In a similar way, the flavour profiles of wine can be modified by either changing fermentation parameters or changing the starter cultures. Apparently, all these improvements can be made by testing a variety of experimental regimens. Thus, bioinformatics and data analytics may be used to optimize the designs of these experimental regimens.

The gene content of particular microorganisms under specific fermentation conditions may be used for deduction of their performance. Of course, such predictions based on a metabolic model must be further verified, as was the case with L. lactis MG1363 flavour formation. Similarly, the genomic sequence of Lactobacillus delbrueckii subsp. bulgaricus revealed how this species is adapted for the fermentation of milk and the production of yoghurt. The Oenococcus oeni and yeast genome analyses have been performed and their relation to wine fermentation was elucidated.

Besides these advantages of the metabolic models it is obvious that predicting more complex phenotype such as stress tolerance is less straight-forward to predict based only on gene content. For prediction of these phenotypes, information on the transcript levels of the genes might be taken into account.

The effects on taste and texture are mainly caused by the metabolites that are produced or transformed during fermentations. Predicting final sensory characteristics is possible using metabolite patterns rather than associating gene content with effects on taste texture. The quantitative descriptive analysis by a trained sensory panel is the golden standard test for sensory characteristics of a fermented product. However, these tests are elaborate and require substantial amounts of the product. In addition, the results are dependent on the panel experience. Using metabolomics’ profiling techniques it is now possible to measure at the same time hundreds of metabolites in a food sample of small quantity. This has led to the development of new statistical methods that associate instrumental data (e.g. chromatographic and/or mass spectrometric ones) to sensory data.

Setting fermentations by mixed cultures

In the preparation of various fermented foods, complex fermentations take place in which strong succession of microbes (bacteria, yeasts and fungi) can occur. These are, for example the processes of obtaining cheese, malolactic wine, soy and seafood. Similar to the approaches of associating transcription of genes to specific phenotypes, described in 2.3.2., presence and absence of (combinations of) microorganisms (or their functionality) can be associated to the characteristics of a fermentation product.

To characterize fermentation, the first essential step is to determine the microorganisms present at the different stages of the fermentation and to make correlation between these sets of microorganisms and the measurement of metabolites (making metabolomics). The functional potential encoded in their genomes determines the properties of the microbial consortia. These metagenomics studies also reveal DNA of unculturable organisms in addition to the DNA of the culturable ones. Thus, functionalities of the microorganisms can be predicted based on the sequences found in a consortium. However, there are still technical restrictions in identifying and separating the DNA of dead microbes that can be a reason for misleading results.

The mRNA-derived sequences of a complex fermentation can be profiled using metatranscriptomics approach. An advantage of metatranscriptomics over metagenomics approaches is that the gene expression measurement allows determining what genes are actually expressed in a mixed culture. Metatranscriptomics technique is using microarrays with the genomes of several species to determine global gene expression across a species. Practical application of this approach is reported for the bacterial communities involved. The advantage of this approach is that the metagenomics and metatranscriptomics profiles can be traced to their likely sources (genome sequences of isolates from the fermentation product). Thus the application of metagenomics/metatranscriptomics techniques to characterize and potentially optimize fermentations is apparent.

It is well known that bacteriophages play an important role in industrial fermentations due to the phenomenon genetic transduction via which biodiversity can be maintained. However, it is also known that phage sweeps disrupt fermentation processes with great efficiency. Currently, predicting the specificity of bacteriophages and the interactions between microorganisms in mixed-culture fermentations are time-consuming tasks. Bioinformatics techniques can be used to analyse the interaction of microbes and bacteriophages. They can contribute to knowledge-based improvements of fermentation stability. This could be achieved by performing experiments with in situ designed microbial consortia that are currently under development.

Bioinformatics in crop production and food processing

The progress of application of Genetically Modified Crops (GMC) as a common approach of food industry depends on genetic research of plants that contribute for successive rate of their production. The main objective of GMC production is to improve quality of raw materials of food supply to ensure their effective processing, and finally to result in costly and safety food. The identification of biosynthetic genes of plant origin that are important for health is supported by Genome sequencing projects. This genome research is directly involved in promoting efficiency and efficacy in plants breeding for their improvement.

A typical example in this direction is the Cocoa (Theobroma cacao) that is used as a raw material for chocolate containing food products. Selection of seeds with higher quality and good flavour has been difficult in the past. For proper seed harvesting the trees have to mature for at least 3 - 5 years. The performance of DNA fingerprinting in screening of plant markers for detection of breeds genotypic links and the availability of EST (Expressed Sequence Tags) sequences and genetic comparisons to other identified plants, all depend on bioinformatics. They will further improve selection of desired traits in early stage of plant’s development based on genotype and phenotype.

As concern food processing, the most direct application of bioinformatics is in optimizing the quantitative parameters of traditional unit operations. In general, the main aim of processing food commodities is to improve storage stability and safety. Usually the processing procedures are associated with considerable excess of energy applied to ensure a large margin for error. The structural complexity of biological materials, the natural variability of living organisms and the response of the input materials to processing parameters are the three main factors that require the large error margin. With the help of bioinformatics our knowledge on biological organisms from bacteria and viruses to plants and animals is emerging progressively, facilitating the optimization of the food processes and diminishing all cost-important inputs, mainly energy.

The big challenge in modern food processing is to merge efficiently biological knowledge of living organisms with the bio-material knowledge necessary to convert them to foods.

Traditionally, during processing the biomaterials of living organisms are restructured into smaller and simpler forms of stable, relatively uniform foods. This process is strongly energy consuming and in most cases, along its performance the inherent biological properties of the living systems are lost. Bioinformatics offers detailed description of the inherent complexity of biological macromolecules within living cells, their structural properties and much of their functions, all of which make the fundamentals of functional genomics and proteomics. Although at the moment just theory, in near future it will be possible to use the inherent structural properties of natural food commodities to self-assemble new foods that retain great biological and nutritional value and that are processed with minimum energy. The biological structure–function relationships discovered through bioinformatics of living systems will be mapped into the structure–function relationships of the next generation of foods. Moreover, the vast knowledge currently being produced by the biomical sciences (genomics, proteomics, metabolomics) will improve the knowledge on ingredient characteristics and behaviours.

The natural properties of the biomaterial molecules that constitute living organisms determine the basic biomaterial properties of foods. While processing food stuffs in a traditional way, little advantage is taken of the unique properties of specific molecules. On the contrary, as a result of the classical processing methods all bio-molecules of a particular class (e.g. carbohydrates), are exposed to physical, thermal and mechanical energy to restructure them into more stable, and/or more bioavailable food systems. During this process all the unique differences (due to the characteristics inherent to biomolecules) are eliminated. Eliminated as well are the complex structure–function relationships of living organisms.

The food processing is not always necessary to the quality of foods. In fact, it is other way around: highly specific biological properties of the original living organism are a key to the processing strategy and contribute significantly to the organoleptic properties of the final food products. For instance, the treatment with rennet enzyme of bovine milk induces the natural aggregation of milk caseins leading to gelation during cheese manufacture. The texture and the organoleptic properties of the final product is due to the unique self-assembly properties of milk casein micelles that are colloidally stabilized in milk by kappa caseins but destabilized when enzymatically cleaved of their solubilizing glycomacropeptide. Another example is the leavening of bread, in which wheat seeds are ground to disassemble their biological structures through mechanical energy, and then the biological processes of yeast fermentation achieve simultaneously the enzymatic elimination of phytic acid during dough incubation and the biochemical production of CO2 as leavening within a mechanically reworked protein gel structure. Thus, cheeses and breads provide proof of positive synergetic effect due to combination of retained biological processes of catalysis, self-assembly and restructuring. However, the functional genomics, proteomics and metabolomics are providing the knowledge necessary to readdress food processing using bimolecular activities. With the availability of such tools in hand, crops production will be organized that will result in products not simply enriched in a single valuable component, but redesigned with a renewed purpose to increase the innumerable values of foods in providing quality of life.

Bioinformatics in food quality & safety

Food science represents a multidisciplinary research and applies area that unifies engineering, biological and physical sciences to explore the types of foods, reasons of their deterioration, mechanisms in food processing and retrieve of food quality. Bioinformatics is executing an important role during most of the processes, if only the data about them are accessible in machine-readable formats. Having in mind the important role of microorganisms in food, the use of bioinformatics tools for predicting and assessing their desired and undesired effects is of special interest. In this respect, the investigations in genomics and proteomics are performed to meet the requirements of food production, food processing, refine the quality and nutritive value of food sources and many others.

Besides, the bioinformatics approaches can also be applied in fabrication the good quality of the crop comprising high yield and disease defense. Different databases containing data on food, their constituents, nutritive value, chemistry and biology exist and can be used in food research and manufacture. A combination of bioinformatics with laboratory verification of selected findings can be outlined with the following methods: genomics-based functional predictions; genomic scale metabolic models, design of complex food properties and engineering.

The research focus in the food industry is outlined by the consumers need for high quality, convenient, tasty, safe and affordable food.

Nutrition and food quality

Modern food science and technology have provided incomparable value to consumers in the literally innumerable number of individual choices of delicious, safe and nutritious foods. This great variety of choices has been supported by scientific knowledge at all levels of the food chain from genetic improvements in agriculture production to engineering of food processes and analysis of consumer sensation. With its power to create detailed molecular knowledge of biological organisms, bioinformatics is assembling the tools to reinvent the food supply. In this way bioinformatics will produce for humans a great value contributing to the increase in the quality of their lives through the quality of the foods they eat. In particular, bioinformatics is:

  • Defining which foods are safe at molecular scale;
  • Developing safer to the consumer foods;
  • Helping to understand the fundamentals of food flavours, textures and taste sensation and understanding the relevant neurophysiological processes;
  • Improving the process of food making and optimizing the flavor and texture impact of foods.

Specific food characteristics effecting its quality

The following important elements characterizing food are used as indicators to develop its description through bioinformatics tools.

Food taste

There are molecular and genetic details of the taste receptors including: sour bitter, umami, sweet, salt. These taste receptors can be used to discover the next generation of taste modifiers for foods. New developments in computational algorithms and software with the available known structures of these receptors have made possible the molecular modelling and simulations. Such simulations will make possible to develop more intense tasting compounds as food additives. These also help in understanding the basis of taste persistence, antagonism and complementation. Bioinformatics sequence similarity algorithms have been used to determine homology between sweet taste receptors and brain glutamate receptors as well as in the identification of sour taste sensors in mammals. Flavor systems are becoming more complex, more attractive and more individualized to consumers.

Food flavour

The formation of flavour in dairy products strongly depends on the essential role of lactic acid. In this respect the investigation of the genetic sequences of lactic acid bacteria showed the flavour forming potential. The profile of many food products does not depend on single compounds but is due to the availability and liaison of many different molecules.

However, bioinformatics plays a serious role in connecting different flavour compounds for new product development on the ground of knowledge, taste and needs of the consumer. Bioinformatics has a considerable cue in providing food quality taste flavour maintaining also its safety. Running in accordance with the molecular evolution, bioinformatics has a pivotal cue in study of evolution of receptors for taste.

With various studies being conducted primarily focusing on the taste receptors with the link between the glucose regulation and bitter taste receptors established. Recently, electronic database was established which include the chemical properties of various compounds related to their taste and flavour. Moreover, study of genetic sequence of lactic acid bacteria played an important role in uncovering the formation of specific flavouring potential helping in giving flavour to many fermented foodstuffs.

In addition to the taste receptors the odor receptors (exceeding the taste ones by 100 X) are being identified as well and the full olfactory complement of genes has been published. This bioinformatics approach to both taste and odor receptors study allows design of sophisticated flavor systems that optimize flavor perception in highly nutritious foods that are currently organoleptically undesirable although their great health value.

Food borne pathogens

Recently, it is admitted that a growing appreciation for bioinformatics exists in the area of food quality and safety. A major problem of food industry are food borne pathogens and the genome sequencing projects are now focusing on innovative tools helping to determine the source of the food borne diseases. Thus, the notification of the specific molecular markers can help in determination of spoilage and pathogenic bacteria and prediction of thermal preservation stress resistance.

A very important output of bioinformatics is the design of tool for detecting and identifying bacterial food pathogens. This tool has been developed by FDA (Food and Drug Administration) for molecular characterization of bacterial food borne pathogens using microarrays.

Due to its potential many genomic sequencing projects are targeting on the food-borne pathogens. With the development of genomic sequencing technologies bioinformatics has propose an innovative way which will help in determining the source of the food-borne diseases. For instance, recently developed approach by the FDA (Food and Drug Administration) helps in detection of the bacterial food pathogen and these computer based tools are focusing on microbial growth prediction on a given food source. To ensure food quality progress it is necessary to use bioinformatics tools that allow detection of various properties of food automatically.

Detection of food allergens

Bioinformatics give efficient approach to evaluate allergenic potential of normal proteins in food and have an important role in safety assessment of genetically modified crops as it is crucial to have safety from food allergy. These tools are acting for prediction of functionality and allergenicity of food products studying the protein sequence of their ingredients. Practically, a comparative genomics technique of bioinformatics has been used to characterize many food related pathogens associate with food and sources linked to their production. They have been an object of many sequencing and comparative genomic research projects. The results obtained showed that such studies can have significant cue in prevention of crop related disease and food poisoning. Crops are major part of food industry and for this reason must be of good quality (i.e. high yielding and disease resistant).Using bioinformatics approach genes identification in the commercially important crops can be used in development of transgenic crops and new genes can increase quality and quantity of food products. Such technique can be useful in elaboration of agro-chemicals based on the idea of signal transduction pathways for specific targets and finding of compounds applicable as pesticide, herbicide or insecticide. Because of the very distinct origin of allergens they possess very large sequence similarity in the structure causing equivalent responses of IgE. The use of these methodologies has incited WHO to involve sequences similarity search as rules of the feature for evaluating allergenicity of genetically modified food. Recently, various techniques of bioinformatics have been performed for allergen diagnostic development to predict the peanut allergy with the help of machine learning.

At present, different databases dedicated to the food allergens exist, like AllerMatch, Informall, FARRP Allergen database and SDAP.

Bioinformatics in food quality and safety

There is a growing appreciation for bioinformatics in the area of good quality and safety. Many food products undergo some form of processing before they reach the consumer, ranging from fermentation to packaging. In many of these processes, microorganisms play important roles, either in transforming the food into the desired end product or in spoiling or contaminating the food.

Bioinformatics plays an increasing role in predicting and assessing the desired and undesired effects of microorganisms on food. I respect to the desired properties, bioinformatics methods can be used to improve the microbial production of fermented food products, such as genomics-based functional predictions, the creation of genome-scale metabolic models and prediction of complex food properties (e.g. taste and texture), and properties of complex fermentations.

For deduction of a specific gene function, correlating analysis of the presence and absence of the gene in organisms with the presence and absence of a certain phenotypic trait in the same set of organisms (the so called gene–trait matching; GTM) is applied. For instance, a set of proteins was predicted to be involved in the degradation of plant (oligo-)saccharides by linking isolation source of bacteria to gene presence/absence.

In the light of food safety, comparative analysis of the genome sequences of a species where some strains have a positive impact (e.g. flavour enhancement) while others are detrimental (e.g. spoilage) can be used to identify genetic elements potentially underlying these differences.

Tools that can be used to link -omics data to phenotypes are PhenoLink and DuctApe. Techniques like multiple displacement amplification can be used to amplify DNA from a single cell, and a range of genome assembly tools can be used to assemble the reads obtained from single-cell sequencing.

And finally, mobile elements such as transposons, plasmids or phages can transfer functionality from one bacterial strain to another. An example is the galactose utilization operon transfer between Lactococcus lactis strains. Identifying potential transposon insertion sites is crucial and can be facilitated by bioinformatics tools such as transposon insertion finder

Risk assessment

The identification of potential health or safety risks of microbial strains present in the food is an important step is risk assessment of food products consumption. Bioinformatics contribute to this issue with the performance of selectively screening microbial genome sequences for genes with specific functionalities - a highly sensitive and computationally efficient way of identification of potential health hazards.

The potential of a specific bacterium for antibiotic resistance or virulence can be investigated by comparing its genome sequence to a reference database containing known resistance genes and virulence factors. Similar approaches have been described for the identification of persistence of bacteria in food products, anaerobic spore-forming organisms in food and potential pathogens using metagenomics data. This (meta)genomics-based methodology can be applied to a wide range of functionalities, e.g. production of antimicrobial peptides.

Tracing and detection of food microorganisms

Food production and food consumption both take place in complex environments. There, besides the microorganisms present in the natural environment, many other sources of biomolecules (proteins, fats and carbohydrates) are present. This complexity is causing difficulties in detection and tracing of specific microorganisms, either potential food pathogens or beneficial probiotic strains added to the food product to enhance its functionality.

Next to classical detection DNA-based techniques such as (q)PCR, new methods based on genomic data have been developed that allow for a fast and precise tracking or detection of specific species or even strains among the natural microflora. For instance, specific amplification and sequencing of a locus that was identified to be discriminatory between different L. plantarum strains was performed and the data obtained showed that this is a useful approach to quantify the relative presence of different strains through the passage of the GIT. The same approach can be followed to design specific primers to distinguish between pathogenic and non-pathogenic populations of specific species and to detect a strain of interest in food products, allowing this specific product to be branded.

The metagenome approaches for dedicated tracing of a single strain can reveal their potential in the detection of harmful bacteria as well. The main advantages of these methods that do not require culturing stage, overcome the concern of creating bias in the results due to failure of detecting low abundant microbes that might be overgrown in culture-dependent detection methods.

The role of toxicogenomics in foods’ quality guarantee

Food safety is becoming more and more a major area of concern for consumers and the food industry has developed a coherent research programme to ensure food safety with well-established classical methodologies but also new state-of-the-art research tools. The goal here is to ensure that the inactivation or inhibition of undesired microbes is possible using the minimum treatment of foods necessary, to increase the understanding on the ecology of food-born microbial populations, to find-out how these populations respond to environmental factors like stress and last but not least the toxicological evaluation of foods and food compounds.

A branch of genomics, toxicogenomics, is an emerging field that contributes to evaluation of toxicological effects of specific compounds. Toxicogenomics utilizes DNA arrays (tox-chips) to test the toxicological effects of a particular compound. The DNA arrays techniques is based on the DNA microchip methodology and it probes human or animal genetic material printed on micro-devices to profile gene expression in cells exposed to test compounds. This technique avoids the study of animal pathology to define illness. The advantages of the test are speed and ease of use, typical for DNA expression analysis, and reduced animal testing. The application of this technique presently faces the challenge of accumulation of massive amounts of data, which are produced through the DNA arrays and their sophisticated analysis and interpretation. Nevertheless, the integration of tox-chip data must into the knowledge basis of the research institutions is a question of near future.


Bioinformatics is increasingly applied in food production, engineering and safety. Some future trends of its potential implementation are as follow:

  • Sequence-based prediction of microbial functionality. An inventory is needed of the functionalities, for which bacteria can reliably be determined using sequence data. New publicly available data sets with genotype/phenotype/transcriptome such as those available for L. lactis and L. plantarum could help to develop new sequence-based functional prediction strategies such as further specified protein domains to more specifically screen for, e.g., carbohydrate active enzymes and relating promoters or regulatory binding sites to phenotype.
  • Establishment of culture collections for desired traits on the basis of knowledge-based in silico screening. This would require databases that integrate data from genomics, systems biology, phenotypes, ingredient information, properties of batches of foods, on-line measuring of parameters during the food making process and ‘biomarkers’ for functionality in specific taxa (based on, e.g., GTM). Specific emphasis should be put in propagating the FAIR (findable, accessible, interoperable, re-usable; http://datafairport.org/) principle in storing data. The future software and databases can be consolidated in a virtual machine that can subsequently be run in the cloud. First steps in this direction are being made in the EU-funded project GenoBox (www.genobox.eu) that aims to create a database that consolidates genotype and phenotype data that allow screening microbial genomes for functionality and safety risk factors.
  • Creation of database to assess risks of the presence of certain microbes/functionality in a given food product. The idea is to determine minor levels of microbial components in many food products across the world through sequencing of the food supply chain. The project is already established by a consortium of IBM and MARS (http://www.research.ibm.com/client-programs/foodsafety/). The ambition is into this data base sufficient biodiversity to be recorded and further use for branding products based on unique microbiota paterns present in fermented products or foods that contain a microbiome.
  • Directing fermentations performance through studying the interactions between microbes and their environment. These approach use systems biology beyond genome-scale metabolic models and kinetic models to describe interactions between microbes and their matrix. To be realized these studies require a substantial knowledge base on both the properties of the microorganisms and the physical properties of the matrix in which the organism operate. The consolidation of the information and expanding amount of data on food fermentation and safety in databases and its combination with appropriate experimental design, algorithms, expertise and follow-up experiments should allow enhancing the prediction of fermentation performance and safety.


  • Abee T. Van Schaik W. & Siezen R. J. (2004). Impact of genomics on microbial food safety. Trends in Biotechnology 22, 653-660.
  • Alkema W. Boekhorst J. Wels M. & S. A. F. T. Van Hijum. (2015). Microbial bioinformatics for food safety and production. Briefings in Bioinformatics.
  • Brul S. Schuren F. Montijn R. Keijser B. J. F. Van Der Spek H. & Oomes S. J. C. M. (2006). The impact of functional genomics on microbiological food quality and safety. International Journal of Food Microbiology 112 195-199.
  • Carrasco-Castilla J. Hernandez-Alvarez A. J. Jimenez-Martınez C. Gutierrez-Lopez G. F. & Davila-Ortiz G. (2012). Use of proteomics and peptidomics methods in food bioactive peptide science and engineering. Food Engineering Reviews 4 224-243.
  • Chibuike C. Udenigwe Bioinformatics approaches prospects and challenges of food bioactive peptide research Trends in Food Science & Technology Volume 36 Issue 2 April 2014 Pages 137-143 ISSN 0924-2244.
  • Desiere F., German B., Watzke H., Pfeifer A., Saguy S. (2001). Bioinformatics and data knowledge: the new frontiers for nutrition and foods. Trends in Food Science & Technology 12 (7): 215-229; ISSN0922244 http://dx.doi.org/10.1016/S09242244(01)00089-9.
  • FAO/WHO. (2001). Evaluation of allergenicity of genetically modified foods. Report of a joint FAO/WHO expert consultation on 14 T.A. Holton et al. / Trends in Food Science & Technology 34 (2013) 5-17 allergenicity of foods derived from biotechnology. Rome: Food and Agriculture Organization of the United Nations (FAO).
  • Lemay D. G. Zivkovic A. M. & German J. B. (2007). Building the bridges to bioinformatics in nutrition research. The American Journal of Clinical Nutrition 86 1261-1269.
  • Liu M. Nauta A. Francke C. & Siezen R. J. (2008). Comparative genomics of enzymes in flavor-forming pathways from amino acids in lactic acid bacteria. Applied and Environmental Microbiology 74 4590-4600.
  • Mari A. Scala E. Palazzo P. Ridolfi S. Zennaro D. & Carabella G. (2006). Bioinformatics applied to allergy: allergen databases from collecting sequence information to data integration. The allergome platform as a model. Cellular Immunology 244 97-100.
  • Mochida K. & Shinozaki K. (2010). Genomics and bioinformatics resources for crop improvement. Plant and Cell Physiology 51 497-523.
  • R.D Pridmore D Crouzillat C Walker S Foley R Zink M.-C Zwahlen H Brüssow V Pétiard B Mollet Genomics molecular genetics and the food industry Journal of Biotechnology Volume 78 Issue 3 31 March 2000 Pages 251-258 ISSN 0168-1656.
  • Waidha K. M., Jabalia N., Singh D., Jha A. and Kaur R., Bioinformatics Approaches in Food Industry: An Overview. Conference Paper November 2015, DOI: 10.13140/RG.2.2.27961.77926
  • Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acid Res 1996;24:238 – 41.
  • The Universal Protein Resource (UniProt). Nucleic Acid Res 2007;35: D193–7.
  • http://www.spss.com/ Clementine
  • http://www.ifst.org/fst.htm
  • http://snp.cshl.org
  • http://datafairport.org
  • http://www.research.ibm.com/client-programs/foodsafety/



The European Commission support for the production of this publication does not constitute endorsement of the contents which reflects the views only of the authors, and the Commission cannot be held responsi-ble for any use which may be made of the information contained therein.