BIOINFORMATICS TOOLS
AB-BLAST : “…software
package for gene and protein identification, using sensitive, selective and
rapid similarity searches of protein and nucleotide sequence databases.”
Antibody: “Abnova is
the world's largest antibody manufacturer. We have the capacity of generating
300 mouse monoclonal antibodies and 200 rabbit polyclonal antibodies per month.
Rather than the traditional method of antibody production, Abnova is taking a
genomic/proteomic approach for the antibody development. Our goal is to have at
least one antibody to every human expressed gene in human genome.”
Argo genome browser: “The
Argo genome Browser is the Broad Institute’s production tool for visualizing
and manually annotating whole genomes.”
ArrayExpress: “ArrayExpress
is a database of functional genomics experiments that can be queried and the
data downloaded. It includes gene expression data from microarray and high throughput
sequencing studies. Data is collected to MIAME and MINSEQE standards.”
BCDE: “Biological
Concept Diagram Editor (BCDE) is a conceptual relationship diagramming tool
specifically designed for biomedical researchers. Compared to existing
diagramming/drawing tools, BCDE has several advantages. It allows for efficient
knowledge/data capture, fast diagram creation, easy data retrieval and flexible
exporting.”
BFAST: “BFAST
facilitates the fast and accurate mapping of short reads to reference
sequences…Specifically, BFAST was designed to facilitate whole-genome
resequencing, where mapping billions of short reads with variants is of utmost
importance.”
Biocarta: “Observe how
genes interact in dynamic graphical models.”
Bioconductor: "Bioconductor
provides tools for the analysis and comprehension of high-throughput genomic
data."
Bioconductor- Biostrings:
“Memory efficient string containers, string matching algorithms, and other
utilities, for fast manipulation of large biological sequences or sets of
sequences.”
BioCyc: “BioCyc is a collection of 3530 Pathway/Genome
databases (PGDBs)...BioCyc provides tools for navigating, visualizing, and
analyzing the underlying databases, and for analyzing omics data”
BioGrid:“BioGRID
is an online interaction repository with data compiled through comprehensive
curation efforts.” It includes “…raw protein and genetic interactions from
major model organism species.”
Biolayou:“BioLayout
Express3D has been specifically designed for visualization, clustering,
exploration and analysis of very large network graphs in two- and
three-dimensional space derived primarily, but not exclusively, from biological
data.”
Bioperl: Bioperl is “…a community effort
to produce Perl code which is useful in biology.”
Biopython:“Biopython
is a set of freely available tools for biological computation written in Python
by an international team of developers. It is a distributed collaborative
effort to develop Python libraries and applications which address the needs of
current and future work in bioinformatics.”
Bioruby:"BioRuby
comes with a comprehensive set of free development tools and libraries for
bioinformatics and molecular biology, for the Ruby programming language.
BioRuby has components for sequence analysis, pathway analysis, protein modelling
and phylogenetic analysis"
BioSearch-2D:
“BioSearch-2D renders the contents of large biomedical document collections
into a single, dynamic map. With this tool, users can generate a summary map of
genes vs. ontology topics that match those provided by expert human reviewed
articles. In addition, BioSearch-2D provides users a context specific,
functional annotation system for high-throughput gene signatures and ad-hoc
gene lists.”
Bowtie:"Bowtie
is an ultrafast, memory-efficient short read aligner geared toward quickly
aligning large sets of short DNA sequences (reads) to large genomes. It aligns
35-base-pair reads to the human genome at a rate of 25 million reads per hour
on a typical workstation. Bowtie indexes the genome with a Burrows-Wheeler index
to keep its memory footprint small: for the human genome, the index is
typically about 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or
colorspace alignment)."
BWA:“BWA is a
software package for mapping low-divergent sequences against a large reference
genome, such as the human genome. It consists of three algorithms:
BWA-backtrack, BWA-SW and BWA-MEM.”
CDART:“The
Conserved Domain Architecture Retrieval Tool (CDART) performs similarity
searches of the Entrez Protein database based on domain architecture, defined
as the sequential order of conserved domains in protein queries. CDART finds
protein similarities across significant evolutionary distances using sensitive
domain profiles rather than direct sequence similarity.”
CHADO: “Chado is
a relational database schema that underlies many GMOD installations. It is
capable of representing many of the general classes of data frequently
encountered in modern biology such as sequence, sequence comparisons,
phenotypes, genotypes, ontologies, publications, and phylogeny.”
Chebi: “Chemical
Entities of Biological Interest (ChEBI) is a freely available dictionary of
molecular entities focused on ‘small’ chemical compounds.”
ChiliBot:"Chilibot
searches PubMed literature database (abstracts) about specific relationships
between proteins, genes, or keywords. The results are returned as a graph.”
CISGenome:“An
integrated tool for tiling array, ChIP-Seq, genome, and cis-regulatory element
analysis.”
Clustal Omega: "Clustal Omega is a new multiple sequence
alignment program that uses seeded guide trees and HMM profile-profile
techniques to generate alignments."
Cn3D:“Cn3D
("see in 3D") is a helper application for your web browser that
allows you to view 3-dimensional structures from NCBI's Entrez Structure database.
Cn3D is provided for Windows and Macintosh, and can be compiled on Unix. Cn3D
simultaneously displays structure, sequence, and alignment, and now has
powerful annotation and alignment editing features.”
Compute pI/MW:“Compute
pI/Mw is a tool which allows the computation of the theoretical pI (isoelectric
point) and Mw (molecular weight) for a list of UniProt Knowledgebase
(Swiss-Prot or TrEMBL) entries or for user entered sequences”
COSMO:“cosmo
searches a set of unaligned DNA sequences for a shared motif that may, for
example, represent a common transcription factor binding site. The algorithm is
similar to MEME, but also allows the user to specify a set of constraints that
the position weight matrix of the unknown motif must satisfy.”
Cross_match:“cross_match
is a general purpose utility for comparing any two DNA sequence sets using a
‘banded’ version of swat…Swat is a program for searching one or more DNA or
protein query sequences, or a query profile, against a sequence database, using
an efficient implementation of the Smith-Waterman or Needleman-Wunsch
algorithms with linear (affine) gap penalties.”
CustomCDF:
“Oligonucleotide probes on GeneChips are reorganized based on the latest genome
and transcriptome information. In addition to custom CDF files for different
target definitions, GeneChips and data analysis platforms, we also provide 1) a
probe mapping file that matches individual probes in the custom CDF file and
the corresponding Affymetrix CDF file; 2) a grouping file that can be used to
find all targets (exons, transcripts) represented by the same probe set. The
grouping file also contains the probe set spanning range on genome or
transcripts to facilitate RT-PCR primer design.”
Cytoscape:"Cytoscape
is an open source software platform for visualizing complex networks and
integrating these with any type of attribute data. A lot of Apps are available
for various kinds of problem domains, including bioinformatics, social network
analysis, and semantic web."
Dapple: “Dapple is a program for
quantitating spots on a two-color DNA microarray image. Given a pair of images
from a comparative hybridization, Dapple finds the individual spots on the
image, evaluates their qualities, and quantifies their total fluorescent
intensities.”
DAVID: “DAVID now
provides a comprehensive set of functional annotation tools for investigators
to understand biological meaning behind large list of genes.”
DIP:“The DIP
database catalogs experimentally determined interactions between proteins. It
combines information from a variety of sources to create a single, consistent
set of protein-protein interactions. The data stored within the DIP database
were curated, both, manually by expert curators and also automatically using
computational approaches that utilize the the knowledge about the
protein-protein interaction networks extracted from the most reliable, core
subset of the DIP data.”
EMBOSS:Includes
“sequence alignment,” “rapid database searching with sequence patterns,”
“protein motif identification, including domain analysis,” “nucleotide sequence
pattern analysis,” “codon usage analysis for small genomes,” “rapid
identification of sequence patterns in large scale sequence sets,” and
“presentation tools for publication.”
Ensembl genome
browser:“The Ensembl project produces genome databases for vertebrates and
other eukaryotic species, and makes this information freely available online.”
ERANGE:“Python
package for doing RNA-seq and ChIP-seq.”
Exonerate:“Exonerate
is a generic tool for pairwise sequence comparison. It allows you to align
sequences using a many alignment models, using either exhaustive dynamic
programming, or a variety of heuristics.”
FASTA and SSEARCH:“This
tool provides sequence similarity searching against protein databases using the
FASTA suite of programs. FASTA provides a heuristic search with a protein
query. FASTX and FASTY translate a DNA query. Optimal searches are available
with SSEARCH (local), GGSEARCH (global) and GLSEARCH (global query, local
database).”
FastQC:"FastQC
aims to provide a simple way to do some quality control checks on raw sequence
data coming from high throughput sequencing pipelines. It provides a modular
set of analyses which you can use to give a quick impression of whether your
data has any problems of which you should be aware before doing any further
analysis."
FindPeaks:“…a
Peak Finder/ Analysis application for ChIP-Seq or RNA-Seq experiments.”
FootPrinter:“A
program for phylogenetic footprinting. Phylogenetic footprinting is a method
that identifies putative regulatory elements in DNA sequences. It identifies
regions of DNA that are unusually well conserved across a set of orthologous
sequences.”
Galaxy: "Galaxy is an open, web-based platform for
data intensive biomedical research. Whether on the free public server or your
own instance, you can perform, reproduce, and share complete analyses."
GE WorkbenchL:"geWorkbench
(previously caWorkbench) is a Java-based open-source platform for integrated
genomics. Using a component architecture geWorkbench allows individually
developed plug-ins to be configured into complex bioinformatics
applications."
Gene2MeSh:“Find
Medical Subject Headings (MeSH terms) enriched for a particular gene or genes
enriched for a particular MeSH term.”
GeneCards:"GeneCards
is a searchable, integrated database of human genes that provides
comprehensive, updated, and user-friendly information on all known and
predicted human genes."
GeneGo:“High
quality biological systems content in context and analytical tools to
accelerate scientific research with extensive ontologies."
GenePattern:“GenePattern
is a powerful genomic analysis platform that provides access to hundreds of
tools for gene expression analysis, proteomics, SNP analysis, flow cytometry,
RNA-seq analysis, and common data processing tasks. A web-based interface
provides easy access to these tools and allows the creation of multi-step
analysis pipelines that enable reproducible in silico research.”
GenMapp:“GenMAPP
is a free computer application designed to visualize gene expression and other
genomic data on maps representing biological pathways and groupings of genes.”
Genomatix:“Genomatix
offers solutions and services for the entire course of analysis from first
level mapping to integrating multiple results with copious high-quality data
background.”
GEO:“A public
functional genomics data repository supporting MIAME-compliant data
submissions. Array- and sequence-based data are accepted. Tools are provided to
help users query and download experiments and curated gene expression
profiles.”
GO:“The Gene
Ontology project is a major bioinformatics initiative with the aim of
standardizing the representation of gene and gene product attributes across
species and databases. The project provides a controlled vocabulary of terms
for describing gene product characteristics and gene product annotation data
from GO Consortium members, as well as tools to access and process this data.”
Graph-ML:“GraphML
is a comprehensive and easy-to-use file format for graphs. It consists of a
language core to describe the structural properties of a graph and a flexible
extension mechanism to add application-specific data.”
Graphviz, Neato:“Graphviz
is open source graph visualization software. Graph visualization is a way of
representing structural information as diagrams of abstract graphs and
networks. It has important applications in networking, bioinformatics, software
engineering, database and web design, machine learning, and in visual
interfaces for other technical domains.”
HPeak: “…a hidden Markov model-based approach that can
accurately pinpoint regions to where significantly more sequence reads map.”
HPRD:“The Human
Protein Reference Database represents a centralized platform to visually depict
and integrate information pertaining to domain architecture, post-translational
modifications, interaction networks and disease association for each protein in
the human proteome. All the information in HPRD has been manually extracted
from the literature by expert biologists who read, interpret and analyze the
published data.”
HUGO:“Human
Genome Organisation (HUGO) is the international organisation of scientists
involved in human genetics.”
Hupo:"The
Human Proteome Organisation (HUPO) is an international scientific organization
representing and promoting proteomics through international cooperation and
collaborations by fostering the development of new technologies, techniques and
training."
HyperTree:“Hypertree
is an open source project very similar to the HyperGraph project. As the name
implies, Hypertree is restricted to hyperbolic trees.”
iHOP:“A network
of concurring genes and proteins extends through the scientific literature
touching on phenotypes, pathologies and gene function. iHOP provides this
network as a natural way of accessing millions of PubMed abstracts. By using genes
and proteins as hyperlinks between sentences and abstracts, the information in
PubMed can be converted into one navigable resource, bringing all advantages of
the internet to scientific literature research.”
Illumina: “At
Illumina, our goal is to apply innovative sequencing and array technologies to
the analysis of genetic variation and function, making studies possible that
were not even imaginable just a few years ago. These studies will help make the
realization of personalized medicine possible...Illumina’s innovative
sequencing and array-based solutions for genomic analysis serve as tools for
disease research, drug discovery, and the development of molecular tests in
clinical labs.”
Ingenuity Pathway
Analysis:“Model, analyze, and understand the complex biological and
chemical systems at the core of life science research with IPA...IPA helps you
understand complex 'omics data at multiple levels by integrating data from a
variety of experimental platforms and providing insight into the molecular and chemical
interactions, cellular phenotypes, and disease processes of your system.”
IntAct:“IntAct
provides a freely available, open source database system and analysis tools for
molecular interaction data. All interactions are derived from literature
curation or direct user submissions and are freely available.”
KEGG Pathway: “KEGG PATHWAY is a collection of
manually drawn pathway maps…representing our knowledge on the molecular
interaction and reaction networks…”
LASTZ:“LASTZ is a
program for aligning DNA sequences, a pairwise aligner. Originally designed to
handle sequences the size of human chromosomes and from different species, it
is also useful for sequences produced by NGS sequencing technologies such as
Roche 454.”
LRpath: "LRpath performs gene set enrichment
testing, an approach used to test for predefined biologically-relevant gene
sets that contain more significant genes from an experimental dataset than
expected by chance."
MACS:“MACS
empirically models the length of the sequenced ChIP fragments, which tends to
be shorter than sonication or library construction size estimates, and uses it
to improve the spatial resolution of predicted binding sites. MACS also uses a
dynamic Poisson distribution to effectively capture local biases in the genome
sequence, allowing for more sensitive and robust prediction.”
Mappfinder:“MAPPFinder
is an accessory program that works with GenMaPP and the annotations from the
Gene Ontology (GO) Consortium to identify global biological trends in gene
expression data.”
MAQ:“Maq stands
for Mapping and Assembly with Quality. It builds assembly by mapping short
reads to reference sequences.”
Mathematica:“Mathematica
is renowned as the world's ultimate application for computations. But it's much
more—it's the only development platform fully integrating computation into
complete workflows, moving you seamlessly from initial ideas all the way to
deployed individual or enterprise solutions.”
Matlab: "MATLAB® is a high-level
language and interactive environment for numerical computation, visualization,
and programming. Using MATLAB, you can analyze data, develop algorithms, and
create models and applications."
McPromoter:“McPromoter
is a program aiming at the exact localization of eukaryotic RNA polymerase II
transcription start sites.”
MEME: “Motif-based sequence analysis tools”
Metab2MeSH: “Find Medical Subject Headings (MeSH terms)
enriched for a particular compound or compounds enriched for a particular MeSH
term.”
MIAME:“MIAME
describes the Minimum Information About a Microarray Experiment that is needed
to enable the interpretation of the results of the experiment unambiguously and
potentially to reproduce the experiment.”
MiMI:“Find
protein interactions, pathways, compounds, literature, and genes merged from
multiple sources.”
MiMI Plugin for
Cytoscape:“Visualize gene interaction networks based on merged interaction
data from MiMI.”
MINT:“MINT
focuses on experimentally verified protein-protein interactions mined from the
scientific literature by expert curators.”
Mosaik:“MOSAIK is
a stable, sensitive and open-source program for mapping second and
third-generation sequencing reads to a reference genome. Uniquely among current
mapping tools, MOSAIK can align reads generated by all the major sequencing
technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion
Torrent and Pacific BioSciences SMRT.”
MUSCLE:“MUSCLE is
one of the best-performing multiple alignment programs according to published
benchmark tests, with accuracy and speed that are consistently better than CLUSTALW.
MUSCLE can align hundreds of sequences in seconds.”
NCBI BLAST: “The
Basic Local Alignment Search Tool (BLAST) finds regions of local similarity
between sequences. The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical significance of matches.
BLAST can be used to infer functional and evolutionary relationships between
sequences as well as help identify members of gene families.”
NetAffx:"The
NetAffx™ Analysis Center enables researchers to correlate their GeneChip® array
results with array design and annotation information."
NovoAlign:“Our
primary product is an aligner for single-ended and paired-end reads from the
Illumina Genome Analyser. Novoalign finds global optimum alignments using full
Needleman-Wunsch algorithm with affine gap penalties.”
Omnigraph:“The
Omnigraph graph processor is an indispensable tool for pupils of all ages. The
software enables you to draw lines like y=x+1 either by typing the equation or
by just clicking the screen. Graphs and shapes can be transformed and
manipulated on screen.”
Oncomine:“Oncomine™
Research Edition is a powerful web application that integrates and unifies
high-throughput cancer profiling data across a large volume of cancer types,
subtypes, and experiments so that target expression can be assessed online, in
seconds. A premium upgrade adds functionality and allows users to search the
Oncomine™ database for enrichment of user-defined gene signatures for
additional insights into biology, regulation, pathways, drug responses, and
patient populations.”
Panther Pathway:"PANTHER
Pathway consists of over 176, primarily signaling, pathways, each with
subfamilies and protein sequences mapped to individual pathway
components...Pathway diagrams are interactive and include tools for visualizing
gene expression data in the context of the diagrams."
PASS:“PASS has
been developed with an innovative strategy to perform fast gapped and ungapped
alignment onto a reference sequence…designed to handle huge amounts of short
reads generated by ILLUMINA, SOLiD, and Roche-454 technology…The optimization
of the internal data structure and a filter based on precomputed short-word
alignments allow the program to skip false positives in the extension phase,
thus reducing the execution time without loss of sensitivity."
Peptide Atlas: “Peptide Atlas is a multi-organism, publicly
accessible compendium of peptides identified in a large set of tandem mass
spectrometry proteomics experiments.”
Phred, Phrap, Consed: “The phred software reads DNA sequencing trace
files, calls bases, and assigns a quality value to each called base.” “phrap is
a program for assembling shotgun DNA sequence data.” “Consed/Autofinish is a
tool for viewing, editing, and finishing sequence assemblies created with phrap.”
Primer3:"Pick
primers from a DNA sequence."
Protein Atlas:“The
Human Protein Atlas portal is a publicly available database with millions of
high-resolution images showing the spatial distribution of proteins in 44
different normal human tissues and 20 different cancer types, as well as 46
different human cell lines.”
ProteomeExchange:“The
ProteomeXchange consortium has been set up to provide a coordinated submission
of MS proteomics data to the main existing proteomics repositories, and to
encourage optimal data dissemination.”
PSI-Search:“PSI-Search
combines the Smith-Waterman search algorithm with the PSI-BLAST profile
construction strategy to find distantly related protein sequences.”
Pyrobayes:"PyroBayes
is a novel base caller for pyrosequences from the 454 Life Sciences sequencing
machines. It was designed to assign more accurate base quality estimates to the
454 pyrosequences."
R:“R is a free
software environment for statistical computing and graphics.”
Reactome:“Reactome
is a free, open-source, curated and peer reviewed pathway database. Our goal is
to provide intuitive bioinformatics tools for the visualization, interpretation
and analysis of pathway knowledge to support basic research, genome analysis,
modeling, systems biology and education.”
Reverse complement:“Reverse
Complement converts a DNA sequence into its reverse, complement, or
reverse-complement counterpart.”
RMAP:“RMAP is
aimed to map accurately reads from the next-generation sequencing technology.
RMAP can map reads with or without error probability information (quality
scores) and supports paired-end reads or bisulfite-treated reads mapping.”
SAM:“Supervised
learning software for genomic expression data mining.”
SBML:“The Systems
Biology Markup Language (SBML) is a representation format, based on XML, for
communicating and storing computational models of biological processes…SBML can
represent many different classes of biological phenomena, including metabolic
networks, cell-signaling pathways, regulatory networks, infectious diseases,
and many others.”
SciMiner:“SciMiner
is a dictionary- and rule-based biomedical literature mining and functional
enrichment analysis tool developed by the Bioinformatics Program, University of
Michigan, Ann Arbor.”
SCIRun: “SCIRun is a Problem Solving
Environment (PSE), for modeling, simulation and visualization of scientific
problems.”
SeqMap:“SeqMap is
a tool for mapping large amount of oligonucleotide to the genome. It is
designed for finding all the places in a genome where an oligonucleotide could
potentially come from.”
SHRiMP:“SHRiMP is
a software package for aligning genomic reads against a target genome. It was
primarily developed with the multitudinous short reads of next generation
sequencing machines in mind, as well as Applied Biosystem’s colourspace genomic
representation.”
SignalP:“…predicts
the presence and location of signal peptide cleavage sites in amino acid
sequences from different organisms: Gram-positive prokaryotes, Gram-negative
prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage
sties and a signal peptide/non-signal peptide prediction based on a combination
of several artificial neural networks.”
Sim4:“sim4 is a
similarity-based tool for aligning an expressed DNA sequence (EST, cDNA, mRNA)
with a genomic sequence for the gene. It also detects end matches when the two
input sequences overlap at one end (i.e., the start of one sequence overlaps
the end of the other).”
SliderII: "High quality SNP calling
using Illumina data at minimal coverage."
SNPs3D:“SNPs3D is
a website which assigns molecular functional effects of non-synonymous SNPs
based on structure and sequence analysis.”
SOAP:“…consists
of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence
builder (SOAPsnp), an indel finder (SOAPindel), a structural variation scanner
(SOAPsv), and a de novo short reads assembler (SOAPdenovo).”
T-Coffee:“T-Coffee
is a multiple sequence alignment package. You can use T-Coffee to align
sequences or to combine the output of your favorite alignment methods (Clustal,
Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee).”
TouchGraph:“TouchGraph
Navigator lets you create interactive network visualizations of your data.”
tranSMART:"tranSMART
is a knowledge management platform that enables scientists to develop and
refine research hypotheses by investigating correlations between genetic and
phenotypic data, and assessing their analytical results in the context of
published literature and other work."
Treeview:“TreeView
is a simple program for displaying phylogenies on Apple Macintosh and Windows
PCs.” "TreeView provides a simple way to view the contents of a NEXUS,
PHYLIP, Hennig86, Clustal, or other format tree file."
UCSC genome browser: “This site contains the
reference sequence and working draft assemblies for a large collection of
genomes. It also provides portals to the ENCODE and Neandertal projects.”
UCSD Signaling
Gateway Molecule Pages:"The UCSD Signaling Gateway Molecule Pages
provide essential information on over thousands of proteins involved in
cellular signaling. Each Molecule Page contains regularly updated information
derived from public data sources as well as sequence analysis, references and
links to other databases."
UniProt:“The
mission of UniProt is to provide the scientific community with a comprehensive,
high-quality and freely accessible resource of protein sequence and functional
information.”
VAST Search:“VAST,
short for Vector Alignment Search Tool, is a computer algorithm developed at
NCBI and used to identify similar protein 3-dimensional structures by purely
geometric criteria, and to identify distant homologs that cannot be recognized
by sequence comparison.”
VISTA genome browser:“VISTA
is a comprehensive suite of programs and databases for comparative analysis of
genomic sequences. There are two ways of using VISTA - you can submit your own
sequences and alignments for analysis (VISTA servers) or examine pre-computed
whole-genome alignments of different species.”
vmatch:“…a
versatile software tool for efficiently solving large scale sequence matching
tasks.” It has a “persistent index,” “alphabet independency,” “versatility,”
and a “flexible input format.”
VTK (TK):“The
Visualization Toolkit (VTK) is an open-source, freely available software system
for 3D computer graphics, image processing and visualization.”
ZOOM: “Using Zoom, zillions of short reads are mapped
back to reference genomes, including post-analysis at unparalleled in speed, at
full sensitivity.”