Advertisement

Lost in the space of bioinformatic tools: A constantly updated survival guide for genetic epidemiology. The GenEpi Toolbox

  • Stefan Coassin
    Affiliations
    Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Schöpfstr. 41, A-6020 Innsbruck, Austria
    Search for articles by this author
  • Anita Brandstätter
    Affiliations
    Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Schöpfstr. 41, A-6020 Innsbruck, Austria
    Search for articles by this author
  • Florian Kronenberg
    Correspondence
    Corresponding author. Tel.: +43 512 9003 70560; fax: +43 512 9003 73560/73561.
    Affiliations
    Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Schöpfstr. 41, A-6020 Innsbruck, Austria
    Search for articles by this author

      Abstract

      Genome-wide association studies (GWASs) led to impressive advances in the elucidation of genetic factors underlying complex phenotypes and diseases. However, the ability of GWAS to identify new susceptibility loci in a hypothesis-free approach requires tools to quickly retrieve comprehensive information about a genomic region and analyze the potential effects of coding and non-coding SNPs in a candidate gene region. Furthermore, once a candidate region is chosen for resequencing and fine-mapping studies, the identification of several rare mutations is likely and requires strong bioinformatic support to properly evaluate and prioritize the found mutations for further analysis. Due to the variety of regulatory layers that can be affected by a mutation, a comprehensive in-silico evaluation of candidate SNPs can be a demanding and very time-consuming task. Although many bioinformatic tools that significantly simplify this task were made available in the last years, their utility is often still unknown to researches not intensively involved in bioinformatics.
      We present a comprehensive guide of 64 tools and databases to bioinformatically analyze gene regions of interest to predict SNP effects. In addition, we discuss tools to perform data mining of large genetic regions, predict the presence of regulatory elements, make in-silico evaluations of SNPs effects and address issues ranging from interactome analysis to graphically annotated proteins sequences. Finally, we exemplify the use of these tools by applying them to hits of a recently performed GWAS.
      Taken together a combination of the discussed tools are summarized and constantly updated in the web-based “GenEpi Toolbox” (http://genepi_toolbox.i-med.ac.at) and can help to get a glimpse at the potential functional relevance of both large genetic regions and single nucleotide mutations which might help to prioritize the next steps.

      Abbreviations:

      CNV (Copy number variation), QTL (Expression quantitative trait locus), ESE (Exonic splicing enhancer), ESPERR (Evolutionary and sequence pattern extraction through reduced representations), ESS (Exonic splicing silencer), GWAS (Genome-wide association study), ISE (Intronic splicing enhancer), ISS (Intronic splicing silencer), LD (Linkage Disequilibrium), rSNP (Regulatory SNP), TFBS (Transcription factor binding site(s))

      Keywords

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Atherosclerosis
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Kronenberg F.
        Genome-wide association studies in aging-related processes such as diabetes mellitus, atherosclerosis and cancer.
        Exp. Gerontol. 2008; 43: 39-43
        • Manolio T.A.
        • Brooks L.D.
        • Collins F.S.
        A HapMap harvest of insights into the genetics of common disease.
        J Clin Invest. 2008; 118: 1590-1605
        • Kronenberg F.
        Emerging risk factors and markers of chronic kidney disease progression.
        Nat Rev Nephrol. 2009; 5: 677-689
        • Samani N.J.
        • Erdmann J.
        • Hall A.S.
        • et al.
        Genomewide association analysis of coronary artery disease.
        N Engl J Med. 2007; 357: 443-453
        • Heid I.M.
        • Boes E.
        • Müller M.
        • et al.
        Genome-wide association analysis of high-density lipoprotein cholesterol in the population-based KORA study sheds new light on intergenic regions.
        Circ Cardiovasc Genet. 2008; 1: 10-20
        • Lettre G.
        • Rioux J.D.
        Autoimmune diseases: insights from genome-wide association studies.
        Hum Mol Genet. 2008; 17: R116-R121
        • Psychiatric GWAS Consortium Coordinating Committee
        Genomewide association studies: history, rationale, and prospects for psychiatric disorders.
        Am J Psychiatry. 2009; 166: 540-556
        • Chorley B.N.
        • Wang X.
        • Campbell M.R.
        • et al.
        Discovery and verification of functional single nucleotide polymorphisms in regulatory genomic regions: current and developing technologies.
        Mutat Res. 2008; 659: 147-157
        • Sugatani J.
        • Yamakawa K.
        • Yoshinari K.
        • et al.
        Identification of a defect in the UGT1A1 gene promoter and its association with hyperbilirubinemia.
        Biochem Biophys Res Commun. 2002; 292: 492-497
        • Knight J.C.
        Regulatory polymorphisms underlying complex disease traits.
        J Mol Med. 2005; 83: 97-109
        • De Gobbi M.
        • Anguita E.
        • Hughes J.
        • et al.
        Tissue-specific histone modification and transcription factor binding in {alpha} globin gene expression.
        Blood. 2007; 110: 4503-4510
        • Mishra P.J.
        • Banerjee D.
        • Bertino J.R.
        MiRSNPs or MiR-polymorphisms, new players in microRNA mediated regulation of the cell: introducing microRNA pharmacogenomics.
        Cell Cycle. 2008; 7: 853-858
        • Parker S.C.
        • Hansen L.
        • Abaan H.O.
        • Tullius T.D.
        • Margulies E.H.
        Local DNA topography correlates with functional noncoding regions of the human genome.
        Science. 2009; 324: 389-392
        • Birnbaum S.
        • Ludwig K.U.
        • Reutter H.
        • et al.
        Key susceptibility locus for nonsyndromic cleft lip with or without cleft palate on chromosome 8q24.
        Nat Genet. 2009; 41: 473-477
        • Cohen J.C.
        • Kiss R.S.
        • Pertsemlidis A.
        • et al.
        Multiple rare alleles contribute to low plasma levels of HDL cholesterol.
        Science. 2004; 305: 869-872
        • Pritchard J.K.
        Are rare variants responsible for susceptibility to complex diseases?.
        Am. J Hum Genet. 2001; 69: 124-137
        • Bodmer W.
        • Bonilla C.
        Common and rare variants in multifactorial susceptibility to common diseases.
        Nat Genet. 2008; 40: 695-701
        • Coassin S.
        • Brandstätter A.
        • Kronenberg F.
        An optimized procedure for the design and evaluation of Ecotilling assays.
        BMC Genomics. 2008; 9: 510-520
        • Bhatti P.
        • Church D.M.
        • Rutter J.L.
        • Struewing J.P.
        • Sigurdson A.J.
        Candidate single nucleotide polymorphism selection using publicly available tools: a guide for epidemiologists.
        Am J Epidemiol. 2006; 164: 794-804
        • Mooney S.
        Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis.
        Brief Bioinform. 2005; 6: 44-56
        • Teufel A.
        • Krupp M.
        • Weinmann A.
        • Galle P.R.
        Current bioinformatics tools in genomic biomedical research (Review).
        Int J Mol Med. 2006; 17: 967-973
        • Ng P.C.
        • Henikoff S: S.I.F.T.
        Predicting amino acid changes that affect protein function.
        Nucleic Acids Res. 2003; 31: 3812-3814
        • Sunyaev S.
        • Ramensky V.
        • Koch I.
        • et al.
        Prediction of deleterious human alleles.
        Hum Mol Genet. 2001; 10: 591-597
        • Cartharius K.
        • Frech K.
        • Grote K.
        • et al.
        MatInspector and beyond: promoter analysis based on transcription factor binding sites.
        Bioinformatics. 2005; 21: 2933-2942
        • Heinemeyer T.
        • Wingender E.
        • Reuter I.
        • et al.
        Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL.
        Nucleic Acids Res. 1998; 26: 362-367
        • Marinescu V.D.
        • Kohane I.S.
        • Riva A.
        The MAPPER database: a multi-genome catalog of putative transcription factor binding sites.
        Nucleic Acids Res. 2005; 33: D91-D97
        • Reumers J.
        • Schymkowitz J.
        • Ferkinghoff-Borg J.
        • et al.
        SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs.
        Nucleic Acids Res. 2005; 33: D527-D532
        • Brazas M.D.
        • Yamada J.T.
        • Ouellette B.F.
        Evolution in bioinformatic resources: 2009 update on the Bioinformatics Links Directory.
        Nucleic Acids Res. 2009; 37: W3-W5
        • McWilliam H.
        • Valentin F.
        • Goujon M.
        • et al.
        Web services at the European Bioinformatics Institute-2009.
        Nucleic Acids Res. 2009; 37: W6-10
        • Sayers E.W.
        • Barrett T.
        • Benson D.A.
        • et al.
        Database resources of the National Center for Biotechnology Information.
        Nucleic Acids Res. 2009; 37: D5-15
        • Hubbard T.J.
        • Aken B.L.
        • Ayling S.
        • et al.
        Ensembl 2009.
        Nucleic Acids Res. 2009; 37: D690-D697
        • Kuhn R.M.
        • Karolchik D.
        • Zweig A.S.
        • et al.
        The UCSC Genome Browser Database: update 2009.
        Nucleic Acids Res. 2009; 37: D755-D761
        • Maher B.
        Personal genomes: the case of the missing heritability.
        Nature. 2008; 456: 18-21
        • Iafrate A.J.
        • Feuk L.
        • Rivera M.N.
        • et al.
        Detection of large-scale variation in the human genome.
        Nat Genet. 2004; 36: 949-951
        • Visel A.
        • Minovitsky S.
        • Dubchak I.
        • Pennacchio L.A.
        VISTA Enhancer Browser—a database of tissue-specific human enhancers.
        Nucleic Acids Res. 2007; 35: D88-D92
        • Uhlen M.
        • Bjorling E.
        • Agaton C.
        • et al.
        A Human Protein Atlas for normal and cancer tissues based on antibody proteomics.
        Mol Cell Proteomics. 2005; 4: 1920-1932
        • Dixon A.L.
        • Liang L.
        • Moffatt M.F.
        • et al.
        A genome-wide association study of global gene expression.
        Nat Genet. 2007; 39: 1202-1207
        • Chen Y.H.
        • Liu C.K.
        • Chang S.C.
        • et al.
        GenoWatch: a disease gene mining browser for association study.
        Nucleic Acids Res. 2008; 36: W336-W340
        • Frazer K.A.
        • Pachter L.
        • Poliakov A.
        • Rubin E.M.
        • Dubchak I.
        VISTA: computational tools for comparative genomics.
        Nucleic Acids Res. 2004; 32: W273-W279
        • Taylor J.
        • Tyekucheva S.
        • King D.C.
        • et al.
        ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements.
        Genome Res. 2006; 16: 1596-1604
        • Yang M.Q.
        • Taylor J.
        • Elnitski L.
        Comparative analyses of bidirectional promoters in vertebrates.
        BMC Bioinformatics. 2008; 9: S9
        • King D.C.
        • Taylor J.
        • Elnitski L.
        • et al.
        Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences.
        Genome Res. 2005; 15: 1051-1060
        • Chelala C.
        • Khan A.
        • Lemoine N.R.
        SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms.
        Bioinformatics. 2009; 25: 655-661
        • Lee P.H.
        • Shatkay H.
        F-SNP: computationally predicted functional SNPs for disease association studies.
        Nucleic Acids Res. 2008; 36: D820-D824
        • Reumers J.
        • Conde L.
        • Medina I.
        • et al.
        Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases.
        Nucleic Acids Res. 2008; 36: D825-D829
        • Yuan H.Y.
        • Chiou J.J.
        • Tseng W.H.
        • et al.
        FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization.
        Nucleic Acids Res. 2006; 34: W635-W641
        • Jegga A.G.
        • Gowrisankar S.
        • Chen J.
        • Aronow B.J.
        PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease.
        Nucleic Acids Res. 2007; 35: D700-D706
        • Hemminger B.M.
        • Saelim B.
        • Sullivan P.F.
        TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits.
        Bioinformatics. 2006; 22: 626-627
        • Kang H.J.
        • Choi K.O.
        • Kim B.D.
        • Kim S.
        • Kim Y.J.
        FESD: a functional element SNPs database in human.
        Nucleic Acids Res. 2005; 33: D518-D522
        • Liu C.K.
        • Chen Y.H.
        • Tang C.Y.
        • et al.
        Functional analysis of novel SNPs and mutations in human and mouse genomes.
        BMC Bioinformatics. 2008; 9: S10
        • Conde L.
        • Vaquerizas J.M.
        • Dopazo H.
        • et al.
        PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes.
        Nucleic Acids Res. 2006; 34: W621-W625
        • Ng P.C.
        • Henikoff S.
        Predicting the effects of amino acid substitutions on protein function.
        Annu Rev Genomics Hum Genet. 2006; 7: 61-80
        • Baralle D.
        • Baralle M.
        Splicing in action: assessing disease causing sequence changes.
        J Med Genet. 2005; 42: 737-748
        • Desmet F.O.
        • Hamroun D.
        • Lalande M.
        • et al.
        Human splicing finder: an online bioinformatics tool to predict splicing signals.
        Nucleic Acids Res. 2009; 37: e67
        • Wasserman W.W.
        • Sandelin A.
        Applied bioinformatics for the identification of regulatory elements.
        Nat Rev Genet. 2004; 5: 276-287
        • Bao L.
        • Zhou M.
        • Wu L.
        • et al.
        PolymiRTS Database: linking polymorphisms in microRNA target sites with complex traits.
        Nucleic Acids Res. 2007; 35: D51-D54
        • Georges M.
        • Clop A.
        • Marcq F.
        • et al.
        Polymorphic microRNA–target interactions: a novel source of phenotypic variation.
        Cold Spring Harb Symp Quant Biol. 2006; 71: 343-350
        • Griffiths-Jones S.
        • Saini H.K.
        • van D.S.
        • Enright A.J.
        miRBase: tools for microRNA genomics.
        Nucleic Acids Res. 2008; 36: D154-D158
        • Lewis B.P.
        • Burge C.B.
        • Bartel D.P.
        Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.
        Cell. 2005; 120: 15-20
        • Papadopoulos G.L.
        • Reczko M.
        • Simossis V.A.
        • Sethupathy P.
        • Hatzigeorgiou A.G.
        The database of experimentally supported targets: a functional update of TarBase.
        Nucleic Acids Res. 2009; 37: D155-D158
        • Jimenez R.C.
        • Quinn A.F.
        • Garcia A.
        • et al.
        Dasty2, an Ajax protein DAS client.
        Bioinformatics. 2008; 24: 2119-2121
        • Bradshaw C.R.
        • Surendranath V.
        • Habermann B.
        ProFAT: a web-based tool for the functional annotation of protein sequences.
        BMC Bioinformatics. 2006; 7: 466
        • Frisch M.
        • Klocke B.
        • Haltmeier M.
        • Frech K.
        LitInspector: literature and signal transduction pathway mining in PubMed abstracts.
        Nucleic Acids Res. 2009; 37: W135-W140
        • Hoffmann R.
        • Valencia A.
        A gene network for navigating the literature.
        Nat Genet. 2004; 36: 664
        • Doms A.
        • Schroeder M.
        GoPubMed: exploring PubMed with the Gene Ontology.
        Nucleic Acids Res. 2005; 33: W783-W786
        • Jensen L.J.
        • Kuhn M.
        • Stark M.
        • et al.
        STRING 8-a global view on proteins and their functional interactions in 630 organisms.
        Nucleic Acids Res. 2009; 37: D412-D416
        • Köhler S.
        • Bauer S.
        • Horn D.
        • Robinson P.N.
        Walking the interactome for prioritization of candidate disease genes.
        Am J Hum Genet. 2008; 82: 949-958
        • Bottillo I.
        • De L.A.
        • Schirinzi A.
        • et al.
        Functional analysis of splicing mutations in exon 7 of NF1 gene.
        BMC Med Genet. 2007; 8: 4
        • Sun C.
        • Southard C.
        • Di R.A.
        Characterization of a novel splicing variant in the RAPTOR gene.
        Mutat Res. 2009; 662: 88-92
        • Armendariz A.D.
        • Krauss R.M.
        Hepatic nuclear factor 1-alpha: inflammation, genetics, and atherosclerosis.
        Curr Opin Lipidol. 2009; 20: 106-111
        • The International HapMap Consortium
        The international HapMap project.
        Nature. 2003; 426: 789-796
        • Li S.
        • Ma L.
        • Li H.
        • et al.
        Snap: an integrated SNP annotation platform.
        Nucleic Acids Res. 2007; 35: D707-D710
        • Ferrer-Costa C.
        • Gelpi J.L.
        • Zamakola L.
        • et al.
        PMUT: a web-based tool for the annotation of pathological mutations on proteins.
        Bioinformatics. 2005; 21: 3176-3178
        • Yue P.
        • Melamud E.
        • Moult J.
        SNPs3D: candidate gene and SNP selection for association studies.
        BMC Bioinformatics. 2006; 7: 166
        • Karchin R.
        • Diekhans M.
        • Kelly L.
        • et al.
        LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.
        Bioinformatics. 2005; 21: 2814-2820
        • Cartegni L.
        • Wang J.
        • Zhu Z.
        • Zhang M.Q.
        • Krainer A.R.
        ESEfinder: a web resource to identify exonic splicing enhancers.
        Nucleic Acids Res. 2003; 31: 3568-3571
        • Fairbrother W.G.
        • Yeh R.F.
        • Sharp P.A.
        • Burge C.B.
        Predictive identification of exonic splicing enhancers in human genes.
        Science. 2002; 297: 1007-1013
        • Wang Z.
        • Rolish M.E.
        • Yeo G.
        • et al.
        Systematic identification and analysis of exonic splicing silencers.
        Cell. 2004; 119: 831-845
        • Zhang X.H.
        • Chasin L.A.
        Computational definition of sequence motifs governing constitutive exon splicing.
        Genes Dev. 2004; 18: 1241-1250
        • Chekmenev D.S.
        • Haid C.
        • Kel A.E.
        P-Match: transcription factor binding site search by combining patterns and weight matrices.
        Nucleic Acids Res. 2005; 33: W432-W437
        • Kim B.C.
        • Kim W.Y.
        • Park D.
        • et al.
        [email protected]: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions.
        BMC Bioinformatics. 2008; 9: S2
        • Rebhan M.
        • Chalifa-Caspi V.
        • Prilusky J.
        • Lancet D.
        GeneCards: integrating information about genes, proteins and diseases.
        Trends Genet. 1997; 13: 163
        • Hindorff L.A.
        • Sethupathy P.
        • Junkins H.A.
        • et al.
        Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.
        Proc Natl Acad Sci USA. 2009; 106: 9362-9367
        • Yu W.
        • Gwinn M.
        • Clyne M.
        • Yesupriya A.
        • Khoury M.J.
        A navigator for human genome epidemiology.
        Nat Genet. 2008; 40: 124-125
        • Thorisson G.A.
        • Lancaster O.
        • Free R.C.
        • et al.
        HGVbaseG2P: a central genetic association database.
        Nucleic Acids Res. 2009; 37: D797-D802
        • Becker K.G.
        • Barnes K.C.
        • Bright T.J.
        • Wang S.A.
        The genetic association database.
        Nat Genet. 2004; 36: 431-432
        • Rhee H.
        • Lee J.S.
        MedRefSNP: a database of medically investigated SNPs.
        Hum Mutat. 2009; 30: E460-E466
        • Singh A.
        • Olowoyeye A.
        • Baenziger P.H.
        • et al.
        MutDB: update on development of tools for the biochemical analysis of genetic variation.
        Nucleic Acids Res. 2008; 36: D815-D819
        • Boutet E.
        • Lieberherr D.
        • Tognolli M.
        • Schneider M.
        • Bairoch A.
        UniProtKB/Swiss-Prot.
        Methods Mol Biol. 2007; 406: 89-112
        • Ryan M.
        • Diekhans M.
        • Lien S.
        • Liu Y.
        • Karchin R.
        LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.
        Bioinformatics. 2009; 25: 1431-1432