LO9: The role of bioinformatics in agriculture

Genomics, metabolomics and interactomics for sustainable agricultural development

Agriculture is not only a major occupation of a few nations, but also way of life, culture and custom. Cereals like rice, wheat, barley, corn, sorghum, millet, sugar cane have always been considered as important food in human populations over different continents. From thousands of years, people are using breeding and selection to make domestic varieties of these crops with the wanted characteristics. Significant progress has been completed in taste, nutritional value and productivity, especially during the “Green Revolution” which took place in 1960 - 1970.

However, the Green Revolution has been also known with its unsuccess and we are no longer capable to survive by few “high yield” varieties. That's why now we need to use more advanced and modern biotechnology methods in agronomy in order to supply nutritional food to continuous increasing world population while considering three important limitations - less plow lands, depletion of energy resources and unpredictable climate change. In other word, we need to enlarge the pace of research so we can be capable to provide enough food for future generations.

The last ten years were considered to be a new era of bioinformatics and computational biology which enlarges the pace of scientific invention in life science. Involvement of computer science in the area of plant biology has change the way we usually do research related to plants in previous decades. Rapid ground breaking progress of sequencing technology during the few last years made this technology so cost-effective that nowadays it is common for any experimental lab to use sequencing methods to study genome of interest.

Including modern biotechnology progress in agriculture will definitely achieve huge dividends to the bioenergy sector, agro-based industries, agricultural by-products utilization, plant improvement and better management of the environment. Latest genome and transcriptomics sequencing of a plant species gives the opportunity to reveal the genetic architecture of many plant species, the differences in thousands of individuals within and outside population, the genes and mutations which are essential for improving the particular wanted complex traits (Fig. 1).

Fig. 1. Structural Genomics

Fig. 1. Structural Genomics

Therefore, we need to use genomics resources available for many non-model and model plant species as a result of rapid technological progress in omics and bioinformatics fields which finally led us to admit new translational area of plant science well-known as ‘Plant Genomics’. Within the scope of plant genomics, we will be able to do following activities:

  1. Sequencing and de novo assembly of non-model plant species;
  2. Making a detailed list of genes with their functional annotation and ontology;
  3. Discovery of a great quantity of SNP (single nucleotide polymorphism) / InDeLs (insertion-deletion length polymorphism) markers to help in fine mapping and selection of superior breed;
  4. Identify “candidate genes/mutations/alleles” in conjunction with wanted traits after differentiating underlying QTLs (quantitative trait locus) from markers generated in 2) using QTL mapping methods e.g. GWAS (genome-wide association study);
  5. Creating “MarkerChip Panel” for the purpose of genotyping and selection.

In this respect, metabolomics is also fast emerging field in the world of omics, and normally used to scan all the metabolites present in sample using LC-MS, NMR-MS and GC-MS instruments. For example in human, it was used to define all the possible metabolites which directly or indirectly indicate food habit of an individuals whose urine samples were collected, analyzed in one of MS instruments and obtained data process computationally (Fig. 2).

Fig. 2. Metabolomics Technology

Fig. 2. Metabolomics Technology

Also the interactome is made up from a complete set of all protein–protein interactions which help to understand the molecular networks governing cellular systems. For example, the interaction map of Arabidopsis revealed about thousands of highly reliable relations between proteins (Arabidopsis Interactome Mapping Consortium 2011).

Impact of genome sequencing in agriculture

The term genome can be applied particularly to the whole genetic material of an organism including the full set of nuclear DNA (i.e., nuclear genome) and also to the genetic information stored within organelles, which have their own DNA - the ‘mitochondrial genome’ or the ‘chloroplast genome’.

Some organisms have multiple copies of chromosomes, which are diploid, triploid, tetraploid, etc. In the reproducing organism (typically eukaryotes) the gamete has half of the number of chromosome of the somatic cell and the genome is a complete set of chromosomes in a gamete.

Moreover, the genome can contain non chromosomal genetic elements like viruses, plasmids or transposable elements. Most biological units which are more complex than a virus, have extra genetic material besides that which has in their chromosomes. Therefore ‘genome’ describes all of the genes and information on non-coding DNA that have the potential to be present.

However, in eukaryotes like plants, protozoa or animals, ‘genome’ is typically associated with only the information on chromosomal DNA. The genetic information contained by DNA within organelles i.e., chloroplast and/or mitochondria is not considered to be a part of the genome. Actually, mitochondria are sometimes mentioned to carry their own genome often called ‘mitochondrial genome’ (Fig. 3). The DNA established in the chloroplast may be called ‘plastome’ (Fig. 4).

Fig. 3. Mitochondrial genome Fig. 4. Plastome

Fig. 3. Mitochondrial genome

Fig. 3. Mitochondrial genome Fig. 4. Plastome

Fig. 4. Plastome

The better understanding of genome evolution comes from the comparative analysis in microbial genome which uses metabolic comparison and gene organization at metabolic reactions level with their operons using pathway, reaction, structure, compounds and gene orthologs. In this regard, the sequencing of whole genomes from various species allows determining their organization and provides the starting point for understanding their functionality, thus favoring human agriculture practice.

At this point, the contribution of genomics to agriculture includes the identification and the manipulation of genes related to particular phenotypic traits as well as genomics breeding by marker-assisted selection of variants. The name “agricultural genomics” (or agri-genomics) aims to find innovative decisions through the study of crops or livestock genomes, reaching information for protection and sustainable productivity for food industry, but also for different aspects such as energy production or design of pharmaceuticals.

Because of the fact that most bacterial species are still unknown most of the methods used for profiling microbial society and characterize their basic functional features are now accepting whole DNA extraction and the use of NGS (Next-Generation Sequencing) on the entire sample, with the objective of sequencing and characterizing DNA fragments of all the species included, i.e., the metagenome (Fig. 5).

Fig. 5. Metagenome analysis

Fig. 5. Metagenome analysis

The application of metagenomics in agriculture also showed to be appropriate for representing the complex patterns of interactions occurring among microorganisms in soil and in plant rhizosphere or in particular tissues or organs. Moreover, metagenomics showed to be useful for tracing the shift in taxonomic composition and functional redundancy of microbial society in rhizosphere and in soil which are in connection to environmental changes related to fertilization and agricultural management.

Applications of agricultural bioinformatics

Collection and storage of plant genetic resource can be used to manufacture stronger, disease and insect resistant crops and improve the quality of livestock making them healthier, more resistant to diseases and more productive.

Comparative genetics of the model and non-model plant species can discover an organization of their genes with respect to each other which are used after that for transferring information from the model crop systems to other food crops. In this regard, examples of existing full plant genomes are Arabidopsis thaliana (water cress) and Oryza sativa (rice).

Also one of the resources for receiving energy by converting into biofuels such as ethanol is plant based biomass and it could be used as for vehicles and planes. In addition, biomass based crop species like maize (corn), switch grass and lignocellulosic species like bagasse and straw are widely used for biofuel production. Accordingly, the use of genomics and bioinformatics in combination with breeding would likely increase the ability of breeding crop species to be being used as biofuel feedstock and therefore keep increasing the use of renewable energy in modern society.

In addition, genes from Bacillus thuringiensis which can control a number of serious pests have been successfully transferred to cotton, maize and potatoes. This new ability of the plants to resist insect outbreak may decrease the number of used insecticides and therefore will increase the nutritional quality of the crops (Fig. 6).

Fig. 6. Bacillus thuringiensis gene

Fig. 6. Bacillus thuringiensis gene

Fig. 6. Bacillus thuringiensis gene

Scientists have recently succeeded in transferring genes into rice to enlarge the levels of Vitamin A, iron and other micronutrients. This success could have a deep impact in reducing incidents of blindness and anemia caused by deficiencies in Vitamin A and iron respectively (Fig. 7.1, 7.2).

Fig. 7. Transfer of genes into rice to enlarge the levels of Vitamin A

Fig. 7. Transfer of genes into rice to enlarge the levels of Vitamin A

Another example is the achieved progress in developing cereal varieties that have a greater tolerance for soil alkalinity, free aluminium and iron toxicities. These varieties will let agriculture succeed in poorer soil areas, therefore adding much more land to the global production base.

In this regard, the purpose of plant genomics is to understand the genetic and molecular basis of all biological processes in plants which are corresponding to the species. This understanding is fundamental because it will allow efficient exploitation of plants as biological resources in the evolution of new cultivars with improved quality and reduced economic and environmental costs. Traits of primary interest are, pathogen and abiotic stress resistance, quality characteristics for plant, and reproductive characteristics determining yield.

Agriculturally important biological database

At the beginning of the “genomic revolution”, the fundamental task of bioinformatics was to establish and maintain databases to store biological information like nucleotide and amino acid sequences.

A biological database is a big, organized form of constant data, which is generally related to computerized software projected to update, query, and retrieve components of the information stored within the system. For example, a record related to a nucleotide sequence database normally contains data like contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, frequently, literature citations related to the sequence.

The development of the database include not only design and store information but also the elaboration of user friendly GUI (graphical user interface) so investigators could both access existing data and submit new or revised data e.g., NCBI, Ensembl.

There are many helpful databases where we can obtain the corresponding information about specific plant species.

For example PlantTribes 2.0 database is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. It uses the graph-based clustering algorithm MCL to categorize all of these species’ protein-coding genes into supposed gene families, also called tribes, using three clustering stringencies (low, medium and high). For all tribes, it generates protein and DNA alignments and maximum-probability phylogenetic trees (Fig. 8).

Fig. 8. PlantTribes 2.0 database

Fig. 8. PlantTribes 2.0 database

There is also a parallel database of microarray experimental results related to the genes, which allows explorers to identify groups of associated genes and their expression patterns.

SuperTribes, built via second iteration of MCL clustering, connect distant, but potentially related gene clusters. All information and analyses are available by a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.

In his latest version, they have import additional another fine scale classification for identifying orthologous genes based on OrthoMCL algorithm.

Another database, the FlagDB database, characterizes a big integrative collection of the structural and functional annotations, and ESTs from six different plant species. Additionally, there are also information about novel gene predictions, mutant tags, gene families, protein motifs, transcriptome data, repeat sequences, primers and tags for genomic approaches, subcellular targeting, secondary structures, tertiary models, curated annotations and mutant phenotypes, which are accessible in this database (Fig. 9).

Fig. 9. Data available in FlagDB database (FLAGdb++ v6.2)

Fig. 9. Data available in FlagDB database (FLAGdb++ v6.2)

Another important example is the Plant genome database: PlantGDB is a catalogue of genomic sequences of all the plant species, created for the purpose to perform comparative genomics. This database also classifies EST sequences into contigs which could characterize and distinguish unique genes (Fig. 10).

Fig. 10. The Plant genome database: PlantGDB

Fig. 10. The Plant genome database: PlantGDB

Other agriculturally important databases along with description and URL are given at Health Science Library System.

Plant genomics

The role of model organism

Over the final century, the investigation and research on a few number of life forms has played an essential role in our understanding of various biological cycles and processes. This is because numerous aspects of science, especially biological processes, are comparable in most even in all living organisms. However, often it is much easier to explore a specific aspect or process in one organism than in others. In this case, these organisms are commonly suggested as model organisms, because their characteristics make them appropriate for laboratory study.

In 1980s, much more people started to think that major investments in studies of numerous different plants like corn, oilseed rape or soybean will dilute efforts to fully understand the main properties of all plants. Moreover, scientists started to realize that their purpose of fully understanding the plant physiology and development is so ambitious that the best decision is to use a model plant species that many scientists can solely explore.

The most well known model organisms have to possess solid preferences for experimental research, such as fast development with short life cycle, small adult measures, ready availability, and tractability. Due to the exstensive study of their characteristics these model organisms become even more valuable. In this point a huge amount of data can be determined from these organisms, giving important information for the analysis of normal human or crop development; gene control, genetic infections and deseases, and evolutionary forms.

For example, Medicago (alfalfa) is a real brilliant diploid which has a significant role in fixing soil nitrogen and has a major part of forage diets. Other grasses and legumes are being also used for extensive EST sequencing and for genetic maps construction. Luckily, the total sequencing of all the genes of one representative plant species will give much more knowledge and information for all higher plants. Also, using model species will further expand the knowledge about all higher plants, especially in revealing the role of proteins and discovery of their functions. For example, the comparison of genome sequences of rice and Arabidopsis revealed planty of useful information for plant genomics because of their extensive but complex designs of synthesis.

Arabidopsis thaliana has become a well-known model plant for most of the researchers. In spite of the fact that it is a non-commercial plant, it is preferred because of its reproduction, development and reaction to stress and disease in the same way as many crop plants. Arabidopsis thaliana has a small genome which does not have the repeated, less-informative DNA sequences that hinder genome analysis performance. Its advantages are that it has large genetic and physical maps of all 5 chromosomes (MapViewer); a fast life cycle (around 6 weeks from seed germination to grown seed); productive seed manufacture and simple cultivation in limited space; a huge number of mutant lines and genomic resources (Stock Centers) and multinational research society of academic, government and industry laboratories.

The whole genome of Arabidopsis has duplicated once throughout its evolution and this event is followed by subsequent gene loss and extensive local gene duplications. The genome has 25,498 genes encoding proteins from 11,000 families (Fig. 11).

Fig. 11. Analysis of Arabidopsis thaliana

Fig. 11. Analysis of Arabidopsis thaliana

Like other model organisms, there is much more information for Arabidopsis genome than the complete genome sequence. The website for the Arabidopsis Information Resource, TAIR, allows explorers to integrate the genome sequence with a large EST database and with the genetic and physical maps, offers links to functional and molecular genetic information and the literature for specific genes and indicates an ever expanding list of mutant stocks.

Alternative plants that are used as model organisms for research are tomatoes, rice, maize and wheat, because of their significant characteristics.

All the available research and genetic data for different model plants are uploaded on corresponding websites. Generally, they are made by particular research groups who integrate the research efforts from all over the world. A few valuable websites include the UK CropNet, the U.S. Agricultural Research Service and organism-specific resources like MaizeDB. These sites aim to link seed stock and actual genetic resources to virtual information on linkage mapping information. That is why various search engines and complex relational databases are under development.

Managing and distributing plant genome data

Genome science has profited significantly from the progress in computing capabilities and bioinformatics, as with numerous areas of science and technology. The growth of the Internet has been vital for genome researchers as well as the improved computational speed.

In conjunction with the development of modern database technology, the World Wide Web has become the native medium for managing and disseminating genomic resources and this led to the creation of shared public resources, which were used for searching and analyzing the contents of genomic databases. Some of the Websites like NCBI and EMBL give quick access to colossal amounts of information and analysis tools, free of charge, from anyplace of the globe. In addition, the advantages of networking have been important for the management of laboratory data with little or no human intervention.

LIMS or laboratory information management systems, let users at different workstations or geographic locations to browse, edit, analyze and comment the data. The main part of the genomic data is a database system and most databases can be classified as either relational databases (RDB) or object-oriented databases (OODB) (Fig. 12).

Fig. 12. Laboratory information management systems (LIMS)

Fig. 12. Laboratory information management systems (LIMS)

There are three essential sequence databases: GenBank (NCBI), the Nucleotide Sequence Database (EMBL) and the DNA Databank of Japan (DDBJ) which are repositories for plant raw sequence information. So also, SWISS-PROT and TrEMBL are the major essential databases for the storage of plant protein sequences. There are also secondary databases such as PROSITE, PRINTS and BLOCKS and the sequences they contain are not raw data, but are derived from the data in the primary databases.

The early bioinformatics databases emphasized primary on data capture. To the early part of this decade the emphasis moved from information capture to information aggregation and integration. Model Organism Databases (MODs), integrated depositories of all the electronic data resources relating to a specific experimental plant or animal species, became the first choice of the bioinformatics world. Integrating numerous types of biological information over several species, these resources enable analysts to make disclosures that wouldn't be possible by analyzing a single species alone. These systems integrate information on numerous organisms and use comparative analysis to find patterns in genome that might otherwise be missed.

The maize genome, for example, is around the same length as the human genome, and won’t be fully sequenced for another few years, but the rice which is one tenth the size of human’s, is already sequenced. Because the two grains are closely related in evolutionary aspect, specific maps have been successfully created that relate maize’s genetic map to the rice genome sequence. This lets analysts to follow a genetically mapped characteristic in maize, such as tolerance to high salt levels in the soil, and move into the relevant region in the rice genome, thereby recognising candidate genes for salt tolerance.

Currently, different bioinformatics approache are applied when studing plant genome data. Some of the most popular are:

Sequence alignment methods and applications for comparing genome sequences: The progress of technologies for the large scale quantification and identification of biological molecules combined with the progress of computing technologies and the internet has contributed to facilitate the delivery of major volumes of biological data to the analysts. The increased productivity was gained through automation, miniaturization, and integration of technologies and applying this approach to the assays of other biological molecules including mRNA, proteins, and metabolites has effected in a large increase in the generation of biological information.

Very often the main essence of the bioinformatics strategies for sequence alignment is the comparison of cDNA/EST and genomic sequences and annotation. In addition to whole genome sequencing, plant sequence information have been collecting from three main sources: sample sequencing of bacterial artifcial chromosomes (BACs), genome survey sequencing (GSS) and sequencing of expressed sequence tags (ESTs).

Sequence alignment: This is the arrangement of two or more amino acid or nucleotide sequences from an organism or organisms in such a way as to adjust areas of the sequences sharing common properties. Well known versions for pairwise alignment are the Smith-Waterman algorithm for local alignment and the Needelman-Wunsch algorithm for global alignment.

Multiple sequence alignment: Multiple alignment demonstrates relationships between two or more sequences. When the involved sequences are different, the conserved residues are often key residues related to maintenance of structural stability or biological function. Multiple alignments can divulge a lot of clues about protein structure and function. The most commonly used alignment software is the ClustalW package.

Sequence Similarity Searching Algorithms: Possibly the most used of these are FASTA and BLAST. Both tools BLAST and FASTA provide very fast searches of sequence databases (Fig 13, 14).


Fig. 13. FASTA

Fig. 14 BLAST

Fig. 14 BLAST

Genome Comparison Tools: MegaBlast is NCBI BLAST based algorithm for large sequence similarity search. MegaBlast is used to liken the raw genomic sequences to a database of contaminant sequences.

Expressed sequence tags (ESTs): ESTs are fractional, gene sequences which have been produced or are in the process of being produced in several laboratories using different species and cultivars as well as diversed tissues and developmental stages. ESTs are now widely used throughout the genomics and molecular biology society for gene discovery, mapping, polymorphism assay, expression studies, and gene prediction.

Molecular plant breeding

Because the resolution of genetic maps in the important crops expands, and because the molecular basis for particular characteristics or physiological responses becomes better clarified, it will be much more possible to associate candidate genes, found in model species, with relevant loci in crop plants. Appropriate relational data will make it possible to freely connect through genomes with regard to gene sequence, supposed function, or genetic map position.

Once this kind of tools have been realized and implemented, the difference between breeding and molecular genetics will disappear. Breeders will use computer models to formulate predictive hypotheses to establish phenotypes of interest from difficult complex allele combinations, and then make those combinations by scoring major populations for a lot of numbers of genetic markers (Fig. 15).

Fig. 15. Reverse genetics in perennial ryegrass

Fig. 15. Reverse genetics in perennial ryegrass

The tremendous resource including breeding knowledge collected over the last decades will become straight linked to basic plant biology, and will increase the ability to clarify gene function in model organisms. For example, characteristics which are badly determined at the biochemical level but well established as a visible phenotype can be related to high resolution mapping with candidate genes.

Orthologous genes in a model species, such as Arabidopsis or rice, may not have a well known connection with a quantitative characteristic like that seen in the crop, but might have been involved in a specific pathway or signaling chain by genetic or biochemical tests. This kind of cross-genome referencing will guide to a convergence of economically corresponding breeding information with main molecular genetic data.

The particular phenotypes of commercial interest which are expected to be spectaculary improved by this progress include both the improvement of factors which frequantly limit agronomic performance (input traits) and the change of the amount and type of materials that crops produce (output traits). Examples include:

  • abiotic stress tolerance (cold, drought, immersion, salt);
  • biotic stress tolerance (fungal, bacterial, viral);
  • nutrient use efficiency;
  • management of plant architecture and progress (size, shape, number, and position, timing of evolution, senescence);
  • metabolite division (redirecting of carbon flow through existing pathways, or moving into new pathways).

Rational plant improvement

The implications of genomics with relation to food, feed and fibre production can be visualized on a lot of fronts. At the most essential level, the progress in genomics will considerably speed up the acquisition of knowledge and that, in turn, will directly effect on many aspects of the processes associated with plant improvement. Knowledge of the function of all plant genes, according to the further elaboration of tools for modifying and examining genomes, will lead to the evolution of an original genetic engineering paradigm in which rational changes can be intented and modelled from first principles.

The goal of plant genomics is to understand the genetic and molecular basis of all

biological processes in plants which are related to the species. This understanding is essential to allow efficient maintenance of plants as biological resources in the development of new cultivars with improved quality and reduced economic and environmental costs (Fig. 16).

Fig. 16. Plant improvement

Fig. 16. Plant improvement

This knowledge is also fundamental for the progress of new plant diagnostic tools. Characteristics which are considered of primary importance are, pathogen and abiotic stress resistance, quality traits for plant, and reproductive traits defining output. A genome program can now be envisioned as an extremely important tool for plant improvement.

Such an approach to determine key genes and understand their function will result in a “quantum leap” in plant improvement. Additionally, the capability to explore gene expression will let us realize how plants react to and interact with the physical environment and management practices.

This information, together with suitable technology, may provide predictive measures of plant health and quality and become an essential part of future plant breeding solution management systems.

Current genome programs produce a large amount of information which will require processing, storage and alignment to the multinational research society. The data incorporate not only sequence information, but information on mutations, markers, maps, functional discoveries, etc. Key objectives for plant bioinformatics include: to favor the submission of all sequence data into the public domain, by repositories, to supply rational annotation of genes, proteins and phenotypes, and to make relationships both within the plants’ data and between plants and other organisms.

Genotype building experiments

In the last few years an increasing amount of data for the DNA polymorphism and sequencing was collected in different plant varieties and cultivars. Most of this data was used for the goal of recognition of various cultivars as well as for their comparison of distances and analogy. This kind of distances are measured by the polymorphism on a part of the chromosome with unknown function.

This kind of polymorphism is widely used in the genomic learning through the species. The information for the polymorphism are analyzed for a potential link with a quantitative characteristic of interest of the particular phenotypes. As such a link is discovered it is called indirect marker. Indirect markers are closely linked, occasionally they may overlap, with a locus which identify this quantitative characteristic, QTL.

QTLs are determined as genes or regions of chromosomes which affect a particular trait. QTLs by themselves are very difficult to be recognized. In both cases this data, or as it is called, markers, can be used in further selection goals. This selection process is named as MAS.

QTLs (Quantitative Trait Locus) analysis and mapping

QTLs and mapping: The main problem is to determine which populations are appropriate for QTL-analyses, unstructured and F2 crosses and in plant - large scale populations in order to screen for potential QTLs. Because selection is based most on markers, higher density of mapping is extremely important. The interval between marker and QTL of about 5 centi Morgans (cM) seemed enough for effective selection. However, the simulation studies indicate that selection precision dropped down to 81% and 74% with 2 cM and 4 cM distance compared to 1cM (Fig. 17).

Fig. 17. QTL mapping of the qGW-5 locus

Fig. 17. QTL mapping of the qGW-5 locus

Use of QTL information: It is supposed to be that some but not all loci are determined, so selection should be based on the combination of phenotypic and molecular data; in the process of selection, the link of markers and traits could reduce so this link should be observed over the generations; in the process of selection, QTLs demonstrate contemporaneous existence of the wanted genes in a line; in crossbred programs, QTLs could predict the efficiency of untested crosses, including their non-additive effect on the data of the parent lines and restricted number of crosses.

Future prospective: With cumulation of molecular data genotype building programs will be elaborated which will define homozygous desirable markers; in introgression programs for combining the intended traits from two lines in one; finally, the real world of agriculture is on the stage of accumulation of molecular information.

Analytical approaches: One of the statistical tools for making the QTL analyses such is the meta-analysis, which synthesize solid QTL data and improve the QTL position. A program of this class is the French BioMercator. Also PlaNet, the European plant genome database network, which is available at is an environment with complex research opportunities.

Further progress and detailed discussion on QTLs involves the statistical aspects of MAS, setting up the threshold of importance of marker effects, overestimation or deviation in estimation of QTL effects, optimization of selection programs for various generations with concomitant using of MAS and phenotypic data. A particular feature is that discovery should be made on plant specific parts, leaves, roots, fruits etc., as it was proved for the grapes.

Experimental results not all the time verify the efficiency of MAS as regards to the genotype building. The major reason is insufficient accuracy of the primary assessment of a QTL, its place and effect. Also some QTLs could be lost in the genotype building process. For complex productivity traits the epistatic waste would be a reason for changes in the value of QTL effect in the parent and progeny generation. Then it is recommended that election is based on the allelic combinations rather on the separate QTLs. It is in accordance to the numerous GxE interactions and with the selection within the environment of interest in the case of disease/drought resistance. Therefore, efficiency of MAS will depend on the complexity of species/trait genetic architecture, on the progress of the characteristic in the environment and on their interaction.

For complex traits the assessment of QTLs should be in different environments. Also phenotypic evaluation/check over the consistent generations is absolutely necessary. For example: drought resistance seemed to be more complex trait vs. disease resistance.

From the economics point of view the use of markers will cost collection of DNA, genotyping, analyses, and discovery of QTLs etc. This high value is paid for the genotype building for characteristics which are expensive for evaluation, disease resistance, or characteristics with low heritability.


  • Boserup E. The conditions of agricultural growth: The economics of agrarian change under population pressure 2005: Transaction Publishers.
  • Lewis W.A. Theory of economic growth. Vol. 7. 2013: Routledge.
  • Yang DT, X. Zhu. Modernization of agriculture and long-term growth. Journal of Monetary Economics, 2013; 60: 367-382.
  • Taiz L. Agriculture, plant physiology, and human population growth: past, present, and future. Theoretical and Experimental Plant Physiology. 2013; 25: 167-181.
  • Zeder MA. 13 Agricultural origins in the ancient world. Anthropology Explored: The Best of Smithsonian AnthroNotes, 2013.
  • Graham RD, Welch RM. Breeding for staple food crops with high micronutrient density 1996: Intl Food Policy Res Inst.
  • Nestel P, Bouis HE, Meenakshi JV, Pfeiffer W. Biofortification of staple food crops. J Nutr. 2006; 136: 1064-1067.
  • Svizzero S, Tisdell C. The Neolithic Revolution and human societies: diverse origins and development paths. School of Economics. University of Queensland. 2014.
  • Randhawa MS. Green Revolution: John Wiley and Sons. 1974
  • Conway GR, Barbier EB. After the green revolution: sustainable agriculture for development. Routledge 2013.
  • Evenson RE, Gollin D. Assessing the impact of the green revolution, 1960 to Science. 2003; 300: 758-762.
  • Pingali PL. Green revolution: impacts, limits, and the path ahead. Proc Natl Acad Sci U S A. 2012; 109: 12302-12308.
  • Wishart DS. Current progress in computational metabolomics. Brief Bioinform. 2007; 8: 279-293.
  • Ouzounis CA. Rise and demise of bioinformatics? Promise and progress. PLoS Comput Biol. 2012; 8: e1002487.
  • Mardis ER. A decade’s perspective on DNA sequencing technology. Nature. 2011; 470: 198-203.
  • Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011; 52: 413-435.
  • Bayat A. Science, medicine, and the future: Bioinformatics. BMJ. 2002; 324: 1018-1022.
  • Rhee SY, Dickerson J, Xu D. Bioinformatics and its applications in plant biology. Annu Rev Plant Biol. 2006; 57: 335-360.
  • Thompson GA, Goggin FL. Transcriptomics and functional genomics of plant defence induction by phloem-feeding insects. J Exp Bot. 2006; 57: 755-766.
  • Grattapaglia D, Plomion C, Kirst M, Sederoff RR. Genomics of growth traits in forest trees. Curr Opin Plant Biol. 2009; 12: 148-156.
  • Edwards D, Batley J. Plant genome sequencing: applications for crop improvement. Plant Biotechnol J. 2010; 8: 2-9.
  • Tuberosa R, Salvi S. Genomics-based approaches to improve drought tolerance of crops. Trends Plant Sci. 2006; 11: 405-412.
  • German JB, Hammock BD, Watkins SM. Metabolomics: building on a century of biochemistry to guide human health. Metabolomics. 2005; 1: 3-9.
  • Cusick ME, Klitgord N, Vidal M, Hill DE. Interactome: gateway into systems biology. Hum Mol Genet. 2005; 14 Spec No.
  • Morsy M, Gouthu S, Orchard S, Thorneycroft D, Harper JF, Mittler R, et al. Charting plant interactomes: possibilities and challenges. Trends Plant Sci. 2008; 13: 183-191.
  • Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science. 2011; 333: 601-607.
  • Angellotti M.C., Bhuiyan S.B., Chen G. And Wan Xiu-Feng (2007) Nucleic Acids Research, 35, W132-W136.
  • Kale U.K., Bhosle S.G., Manjari G.S., Joshi M., Bansode S. and Kolaskar A.S. (2006) BMC Bioinformatics, S12-S27.
  • Tsuru T. and Kobayashi I. (2008 Molecular Biology Evolution, 25, 2457-2473.
  • Morrell PL, Buckler ES, Ross-Ibarra J. Crop genomics: advances and applications. Nat Rev Genet. 2012;13(2):85–96.
  • Ellegren H. Genome sequencing and population genomics in nonmodel organisms. Trends Ecol Evol. 2014;29(1):51–63.
  • Weigel D, Mott R. The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 2009;10(5):107.
  • Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alfoldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–82.
  • Zhang Z, Ober U, Erbe M, Zhang H, Gao N, He J, Li J, Simianer H. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One. 2014;9(3):e93017.
  • Organization EPS. European plant science: a field of opportunities. J Exp Bot. 2005;56(417):1699–709.
  • Iovene M, Barone A, Frusciante L, Monti L, Carputo D. Selection for aneuploid potato hybrids combining a low wild genome content and resistance traits from Solanum commersonii. Theor Appl Genet. 2004;109(6):1139–46.
  • van der Vlugt R, Minafra A, Olmos A, Ravnikar M, Wetzel T, Varveri C, Massart S. Application of next generation sequencing for study and diagnosis of plant viral diseases in agriculture. 2015.
  • Van Borm S, Belák S, Freimanis G, Fusaro A, Granberg F, Höper D, King DP, Monne I, Orton R, Rosseel T. Next-generation sequencing in veterinary medicine: how can the massive amount of information arising from high-throughput technologies improve diagnosis, control, and management of infectious diseases? In: Veterinary infection biology: molecular diagnostics and high-throughput strategies. Berlin: Springer; 2015. p. 415–36.
  • Blanchfield J. Genetically modified food crops and their contribution to human nutrition and food quality. J Food Science. 2004, 69(1):CRH28-CRH30.
  • Yuan JS, Tiller KH, Al-Ahmad H, Stewart NR, Stewart CN Jr. Plants to power: bioenergy to fuel the future. Trends Plant Sci. 2008;13(8):421–9.
  • Ma JKC, Drake PMW, Christou P. The production of recombinant pharmaceutical proteins in plants. Nat Rev Genet. 2003;4(10):794–805.
  • Wilson SA, Roberts SC. Metabolic engineering approaches for production of biochemicals in food and medicinal plants. Curr Opin Biotechnol. 2014;26:174–82.
  • Carbonetto B, Rascovan N, Álvarez R, Mentaberry A, Vázquez MP. Structure, composition and metagenomic profile of soil microbiomes associated to agricultural land use and tillage systems in Argentine Pampas. 2014.
  • Mendes LW, Kuramae EE, Navarrete AA, van Veen JA, Tsai SM. Taxonomical and functional microbial community selection in soybean rhizosphere. The ISME journal. 2014;8(8):1577–87.
  • Fouts DE, Szpakowski S, Purushe J, Torralba M, Waterman RC, MacNeil MD, Alexander LJ, Nelson KE. Next generation sequencing to define prokaryotic and fungal diversity in the bovine rumen. 2012.
  • Rastogi G, Coaker GL, Leveau JH. New insights into the structure and function of phyllosphere microbiota through high-throughput molecular approaches. FEMS Microbiol Lett. 2013;348(1):1–10.
  • Pan Y, Cassman N, de Hollander M, Mendes LW, Korevaar H, Geerts RH, van Veen JA, Kuramae EE. Impact of long-term N, P, K, and NPK fertilization on the composition and potential functions of the bacterial community in grassland soil. FEMS Microbiol Ecol. 2014;90(1):195–205.
  • Bevivino A, Paganin P, Bacci G, Florio A, Pellicer MS, Papaleo MC, Mengoni A, Ledda L, Fani R, Benedetti A. Soil Bacterial community response to differences in agricultural management along with seasonal changes in a mediterranean region. 2014.
  • Souza RC, Hungria M, Cantão ME, Vasconcelos ATR, Nogueira MA, Vicente VA. Metagenomic analysis reveals microbial functional redundancies and specificities in a soil under different tillage and crop-management regimes. Appl Soil Ecol. 2015;86:106–12.
  • Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, et al. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell. 2009; 21: 3718-3731.
  • Boyle G. Renewable energy2004: OXFORD university press.
  • Turner JA. A realizable renewable energy future Science. 1999; 285: 687-689.
  • Betz FS, Hammond BG, Fuchs RL. Safety and advantages of Bacillus thuringiensis-protected plants to control insect pests. Regul Toxicol Pharmacol. 2000; 32: 156-173.
  • Paine JA, Shipton CA, Chaggar S, Howells RM, Kennedy MJ, Vernon G, et al. Improving the nutritional value of Golden Rice through increased pro-vitamin A content. Nat Biotechnol. 2005; 23: 482-487.
  • Blum A. Plant breeding for stress environments1988: CRC Press, Inc.
  • Xu Y. Molecular plant breeding2010: CABI.
  • Hack C, Kendall G. Bioinformatics: Current practice and future challenges for life science education. Biochem Mol Biol Educ. 2005; 33: 82-85.
  • Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002; 30: 1575-1584.
  • Wall PK, Leebens-Mack J, Müller KF, Field D, Altman NS, dePamphilis CW. PlantTribes: a gene and gene family resource for comparative genomics in plants. Nucleic Acids Res. 2008; 36: D970-976.
  • Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003; 13: 2178-2189.
  • Samson F, Brunaud V, Balzergue S, Dubreucq B, Lepiniec L, Pelletier G, et al. FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformants. Nucleic Acids Res. 2002; 30: 94-97.
  • Samson F, Brunaud V, Duchêne S, De Oliveira Y, Caboche M, Lecharny A, et al. FLAGdb++: a database for the functional analysis of the Arabidopsis genome. Nucleic Acids Res. 2004; 32: D347-350.
  • Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, et al. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 2008; 36: D959-965.
  • Walsh B (2001) Quantitative genetics in the age of genomics. Theoretical Population Biology, 59: 175-184.
  • Reif JC, Melchinger AE and Frisch M (2005) Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Science, 45: 1-7.
  • Kearsey MJ (1998) The principles of QTL analysis (a minimal mathematics approach). Journal of Experimental Botany, 49(327): 1619-1623.
  • Morgante M and Salamini F. (2003) From plant genomics to breeding practice. Current Opinion in Biotechnology, 14: 214-219.
  • Sen S and Churchill GA (2001) A statistical framework for quantitative trait mapping. Genetics, 159, 371-387.
  • Mohammadi SA and Prasanna BM (2003) Analysis of Genetic Diversity in Crop Plants—Salient Statistical Tools and Considerations. Crop Science, 43: 1235-1248.
  • Orr HA. (2005) The genetic theory of adaptation: a brief history. Nature Review Genetics, 6: 119-127.



The European Commission support for the production of this publication does not constitute endorsement of the contents which reflects the views only of the authors, and the Commission cannot be held responsi-ble for any use which may be made of the information contained therein.