Bioinformatics Resources
Click on the links below to learn more
Nucleotide Sequence Databases (the principal ones)
Protein Sequence Databases
SWISS-PROT & TrEMBL - Protein sequence database and computer annotated supplement
UniProt - UniProt (Universal Protein Resource) is the world's most comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
PIR - Protein Information Resource
MIPS - Munich Information centre for Protein Sequences
HUPO - HUman Proteome Organization
Database Searching by Sequence Similarity
BLAT Jim Kent's Blat is just superb in terms of speed and the integrated view you get for viewing the results
Sequence Alignment
USC Sequence Alignment Server - align 2 sequences with all possible varieties of dynamic programming
T-COFFEE - multiple sequence alignment
ClustalW @ EBI - multiple sequence alignment
MSA 2.1 - optimal multiple sequence alignment using the Carrillo-Lipman method
BOXSHADE - pretty printing and shading of multiple alignments
Splign - Splign is a utility for computing cDNA-to-Genomic, or spliced sequence alignments. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals.
Spidey - an mRNA-to-genomic alignment program
Wise2 - align a protein or profile HMM against genomic sequence to predict a gene structure, and related tools
PipMaker - computes alignments of similar regions in two (long) DNA sequences
VISTA - align + detect conserved regions in long genomic sequences
myGodzilla - align a sequence to its ortholog in the human genome
Human Genome Databases
Ensembl - automatically annotated human genome. The DataMining (Mart View) is cool and very useful!
GDB - Genome Database
Mammalian Gene Collection - full-length (open reading frame) sequences for human and mouse
STACK - Sequence Tag Alignment and Consensus Knowledgebase
GeneCards - human genes, proteins and diseases
Databases of other Organisms
Genome-wide Analysis
MBGD - comparative analysis of completely sequenced microbial genomes
COGs - phylogenetic classification of orthologous proteins from complete genomes
STRING - detect whether a given query gene occurs repeatedly with certain other genes in potential operons
Pedant - automatic whole genome annotation
GeneCensus - various whole genome comparisons
Protein Domains: Databases and Search Tools
InterPro - integration of Pfam, PRINTS, PROSITE, SWISS-PROT + TrEMBL
PROSITE - database of protein families and domains
Pfam - alignments and hidden Markov models covering many common protein domains
SMART - analysis of domains in proteins
ProDom - protein domain database
PRINTS Database - groups of conserved motifs used to characterise protein families
Blocks - multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins
Protein Domain Profile Analysis @ BMERC - search a library of profiles with a protein sequence
TIGRFAMs - yet more protein families based on Hidden Markov Models
Motif and Pattern Search in Sequences
Gibbs Motif Sampler - identification of conserved motifs in DNA or protein sequences
AlignACE Homepage - gene regulatory motif finding
MEME - motif discovery and search in protein and DNA sequences
SAM - tools for creating and using Hidden Markov Models
Pratt - discover patterns in unaligned protein sequences
Motivated Proteins - a web facility for exploring small hydrogen-bonded motifs
Protein 3D Structure
PDB - protein 3D structure database
RasMol / Protein Explorer - molecule 3D structure viewers
SCOP - Structural Classification Of Proteins
FSSP - fold classification based on structure-structure alignment of proteins
SWISS-MODEL - homology modeling server
K2 - protein structure alignment
DALI - 3D structure alignment server
DSSP - defines secondary structure and solvent exposure from 3D coordinates
HSSP Database - Homology-derived Secondary Structure of Proteins
PredictProtein & PHD - predict secondary structure, solvent accessibility, transmembrane helices, and other stuff
Jpred2 - protein secondary structure prediction
PSIpred (& MEMSAT & GenTHREADER) - protein secondary structure prediction (& transmembrane helix prediction & tertiary structure prediction by threading)
Phylogeny & Taxonomy
Species 2000 - index of the world's known species
TreeBASE - a database of phylogenetic knowledge
PHYLIP - package of programs for inferring phylogenies
TreeView - user friendly tree displaying for Macs & Windows
Gene Prediction
Genscan - eukaryotes
Genie - eukaryotes
GLIMMER - prokaryotes
tRNAscan - SE 1.1 - search for tRNA genes in genomic sequence
GFF (General Feature Format) Specification - a standard format for genomic sequence annotation
Gene Expression Databases (including RNA-seq and single cell)
HuGE - database of human gene expression using arrays
ExpressDB - yeast and E. coli RNA expression data
SAGE @ NCBI - Serial Analysis of Gene Expression
Gene Expression Omnibus (NCBI GEO)
BioJupies - Generates RNA-Seq data analysis notebooks New!
Single Cell Portal (Broad Institute) New!
Gene Regulation
TRANSFAC - database of eukaryotic cis-acting regulatory DNA elements and trans-acting factors
EPD - eukaryotic promoter database
DBTSS - DataBase of Transcriptional Start Sites (human)
SCPD - Saccharomyces cerevisiae promoter database
DCPD - Drosophila Core Promoter Database
RegulonDB - a database on transcriptional regulation in E. coli
DPInteract - protein binding sites on E. coli DNA
PromoterInspector - prediction of promoter regions in mammalian genomic sequences
MatInspector - search for transcription factor binding sites
Cister - cis-element cluster finder
TarBase Provides a means of searching through a comprehensive set of experimentally supported microRNA targets in at least 8 organisms
microRNA resource A gateway to all types of information about microRNAs, including articles, products, news, events, and other websites
Metabolic, Gene Regulatory & Signal Transduction Network Databases
KEGG - Kyoto Encyclopedia of Genes and Genomes
DAVID - Database for Annotation, Visualization and Integrated Discovery - A useful server to for annotating microarray and other genetic data.
stke - Signal Transduction Knowledge Environment
BIND - Biomolecular Interaction Network Database
PathGuide A very useful collection of resources dealing primarily with pathways
SPAD - Signaling Pathway Database
CSNDB - Cell Signalling Networks Database
DIP - Database of Interacting Proteins
PFBP - Protein Function and Biochemical Networks
Systems Biology
A list of institutes specializing in systems biology or related research
Gene List Annotation Tools (Functional Enrichment)
DAVID - Database for Annotation, Visualization and Integrated Discovery - A useful server to for annotating microarray and other genetic data.
MSigDB - Molecular Signatures Database
ToppGene Suite Gene list functional enrichment and candidate gene prioritization (My Personal favorite)
Enrichr Gene list functional enrichment - Extensive compilation of data resources (My Personal favorite!)
Metascape Gene annotation and analysis resource - Excellent output options (My Personal favorite!)
Panther - Protein ANalysis THrough Evolutionary Relationships
Other Databases (Annotations, Ontologies, Consortia, etc.)
Entrez Gene - Gene provides a unified query environment for genes defined by sequence and/or in NCBI's Map Viewer. You can query on names, symbols, accessions, publications, GO terms, chromosome numbers, E.C. numbers, and many other attributes associated with genes and the products they encode. Replaces LocusLink.
Gene Ontology Consortium - a controlled vocabulary of eukaryotic gene roles
Open Biological Ontologies an umbrella web address for well-structured controlled vocabularies for shared use across different biological domains.
ACUTS - compilation of Ancient Conserved UnTranslated Sequences
ENZYME - enzyme nomenclature database
BRENDA - enzyme database
TC-DB - comprehensive classification of membrane transport proteins
HGBASE - database of sequence variations in the human genome
MethDB - DNA methylation database
SpliceDB - canonical and non-canonical splice site sequences in mammalian genes
SpliceOme - database of intron-exon boundaries
InBase - intein database
The Kabat Database of Sequences of Proteins of Immunological Interest
REBASE - restriction enzyme database
Chemfinder.com - molecule database
Mouse SNPs Database- 670,000+ SNP records, 8.0+ million allele calls. Allele tables are provided by investigators or retrieved from public sources. All SNPs are mapped to NCBI Mouse Genome build 33 (C57BL/6J assembly). Most are linked to NCBI dbSNP build 123.
MetaBase is a user contributed database of databases, listing all the biological databases currently available on the internet.
Miscellaneous Tools
NCBI Genome Workbench - NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.
Morpheus - Analyze gene expression on the cloud (My Personal favorite)
Repeatmasker - mask repetitive elements in DNA sequences
Vienna RNA Package - RNA secondary structure prediction
mfold (1) - RNA secondary structure prediction
mfold (2) - RNA secondary structure prediction
EST parser - find alternative polyadenylation sites in mRNAs, using ESTs
UTR-extender - extends missing ends of an mRNA using EST and genome sequence data
CpG Islands - predict CpG islands
NetStart - prediction of translation start sites in vertebrate and A.thaliana sequences
ATGpr - prediction of translation start sites in cDNA sequences
SignalP - secretory signal peptide prediction
PSORT - prediction of protein sorting signals and transmembrane helices
CBS Prediction Servers - prediction of protein subcellular localization and various sites in protein and nucleotide sequences
Melting - calculate melting temperature for nucleic acid duplexes
bend.it - calculate curvature and bendability of a DNA sequence
webcutter - detect restriction enzyme cutting sites in DNA sequences
Primer3 - pick primers from a DNA sequence
Probability Distribution Calculators - normal, chi square, t, F, etc.
Computational Resources
SourceForge - SourceForge.net is the world's largest Open Source software development website, with the largest repository of Open Source code and applications available on the Internet. SourceForge.net provides free services to Open Source developers.
W3C - World Wide Web Consortium, definitive reference for HTML and other WWW stuff
Web Developer's Virtual Library - encyclopedia of web design tutorials, articles and discussions
CPAN - PERL modules
bioperl - bioinformatics related PERL modules
Bioinformatics on-line course materials and tutorials (not an exhaustive collection)
Intro to bioinformatics and computational biology:
Introduction to Bioinformatics (Technion - Israel Institute of Technology)
A taste of bioinformatics (University College London)
Introduction to Computational Molecular Biology (Washington University in St. Louis)
Introduction to Bioinformatics (UCSD Extension)
Computational Biology (University of Washington)
Introduction to Computational Biology (Carnegie Mellon University)
Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis (University of Washington)
Algorithms:
Algorithms in Computational Biology (Technion - Israel Institute of Technology)
Algorithms for Molecular Biology (School of Mathematical Sciences at Tel Aviv University)
Miscellaneous:
Course Era (Great resource!) - Free online courses from top Universities!
Software Carpentry (Great resource!) - a non-profit volunteer organization whose members teach researchers basic software skills
Data Carpentry - teaches basic concepts, skills, and tools for working more effectively with data
Elementary Sequence Analysis (McMaster University)
Dynamic Programming Tutorial (By Eric C. Rouchka)
Beginner's Guide to Molecular Biology (Rothamsted Research)
A Primer on Molecular Genetics (Iowa State University)
Online Lectures on Bioinformatics (Max Planck Institute for Molecular Genetics)
EMBnet.org Courses (EMBnet.org)
DNA and Protein Sequence Analysis (Boston University)
Computational Molecular Biology (Stanford University)
Current Topics in Genome Analysis (handouts) (NIH)
Bioinformatics and Genomic Analysis (University of Arizona)
Introduction to Structural Bioinformatics (UCSD Extension)
Perl Programming Course for Bioinformatics and Internet (Feinberg Graduate School of the Weizmann Institute of Science, Rehovot, Israel)
Object-Oriented and Database Programming for Bioinformatics and Internet (Feinberg Graduate School of the Weizmann Institute of Science, Rehovot, Israel)
Computer Skills For Biologists (UCSD Extension)
An Intro to R (UMN)
Web Sites for Background Information & News
NCBI Education - Probably the best starting point for anyone contemplating to switch to Bioinformatics
NCBI Bookshelf - Includes a number of popular books in electronic format including Genomes by Brown and Human Molecular Genetics by Strachan.
National Human Genome Research Institute - NIH - Educational Resources
International Union of Biochemistry and Molecular Biology nomenclature
Other Collections of Bioinformatics Resources
LabWorm: An aggregator of scientific online tools. Lets you stay updated on the newest and most relevant tools for your research. It is also a crowd voting platform for the scientific community, leveraging the community’s hands-on experience and judgment to vote on the various tools.
Biostars: An online question & answer resource for the bioinformatics community (My Personal favorite)
Bioinformatics software and tools: Contains several useful links to bioinformatics databases, and tools