-->

Thursday, August 20, 2015

Essential bioinformatics tools A bioinformation should know

BIOINFORMATICS TOOLS

AB-BLAST : “…software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases.”
Antibody: “Abnova is the world's largest antibody manufacturer. We have the capacity of generating 300 mouse monoclonal antibodies and 200 rabbit polyclonal antibodies per month. Rather than the traditional method of antibody production, Abnova is taking a genomic/proteomic approach for the antibody development. Our goal is to have at least one antibody to every human expressed gene in human genome.”
Argo genome browser: “The Argo genome Browser is the Broad Institute’s production tool for visualizing and manually annotating whole genomes.”
ArrayExpress: “ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies. Data is collected to MIAME and MINSEQE standards.”
BCDE: “Biological Concept Diagram Editor (BCDE) is a conceptual relationship diagramming tool specifically designed for biomedical researchers. Compared to existing diagramming/drawing tools, BCDE has several advantages. It allows for efficient knowledge/data capture, fast diagram creation, easy data retrieval and flexible exporting.”
BFAST: “BFAST facilitates the fast and accurate mapping of short reads to reference sequences…Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance.”
Biocarta: “Observe how genes interact in dynamic graphical models.”
Bioconductor: "Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data."
Bioconductor- Biostrings: “Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.”
BioCyc: “BioCyc is a collection of 3530 Pathway/Genome databases (PGDBs)...BioCyc provides tools for navigating, visualizing, and analyzing the underlying databases, and for analyzing omics data”
BioGrid:“BioGRID is an online interaction repository with data compiled through comprehensive curation efforts.” It includes “…raw protein and genetic interactions from major model organism species.”
Biolayou:“BioLayout Express3D has been specifically designed for visualization, clustering, exploration and analysis of very large network graphs in two- and three-dimensional space derived primarily, but not exclusively, from biological data.”
Bioperl:                Bioperl is “…a community effort to produce Perl code which is useful in biology.”
Biopython:“Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.”
Bioruby:"BioRuby comes with a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, for the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis"
BioSearch-2D: “BioSearch-2D renders the contents of large biomedical document collections into a single, dynamic map. With this tool, users can generate a summary map of genes vs. ontology topics that match those provided by expert human reviewed articles. In addition, BioSearch-2D provides users a context specific, functional annotation system for high-throughput gene signatures and ad-hoc gene lists.”
Bowtie:"Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. It aligns 35-base-pair reads to the human genome at a rate of 25 million reads per hour on a typical workstation. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: for the human genome, the index is typically about 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace alignment)."
BWA:“BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.”
CDART:“The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the Entrez Protein database based on domain architecture, defined as the sequential order of conserved domains in protein queries. CDART finds protein similarities across significant evolutionary distances using sensitive domain profiles rather than direct sequence similarity.”
CHADO: “Chado is a relational database schema that underlies many GMOD installations. It is capable of representing many of the general classes of data frequently encountered in modern biology such as sequence, sequence comparisons, phenotypes, genotypes, ontologies, publications, and phylogeny.”
Chebi: “Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.”
ChiliBot:"Chilibot searches PubMed literature database (abstracts) about specific relationships between proteins, genes, or keywords. The results are returned as a graph.”
CISGenome:“An integrated tool for tiling array, ChIP-Seq, genome, and cis-regulatory element analysis.”
Clustal Omega: "Clustal Omega is a new multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments."
Cn3D:“Cn3D ("see in 3D") is a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez Structure database. Cn3D is provided for Windows and Macintosh, and can be compiled on Unix. Cn3D simultaneously displays structure, sequence, and alignment, and now has powerful annotation and alignment editing features.”
Compute pI/MW:“Compute pI/Mw is a tool which allows the computation of the theoretical pI (isoelectric point) and Mw (molecular weight) for a list of UniProt Knowledgebase (Swiss-Prot or TrEMBL) entries or for user entered sequences”
COSMO:“cosmo searches a set of unaligned DNA sequences for a shared motif that may, for example, represent a common transcription factor binding site. The algorithm is similar to MEME, but also allows the user to specify a set of constraints that the position weight matrix of the unknown motif must satisfy.”
Cross_match:“cross_match is a general purpose utility for comparing any two DNA sequence sets using a ‘banded’ version of swat…Swat is a program for searching one or more DNA or protein query sequences, or a query profile, against a sequence database, using an efficient implementation of the Smith-Waterman or Needleman-Wunsch algorithms with linear (affine) gap penalties.”
CustomCDF: “Oligonucleotide probes on GeneChips are reorganized based on the latest genome and transcriptome information. In addition to custom CDF files for different target definitions, GeneChips and data analysis platforms, we also provide 1) a probe mapping file that matches individual probes in the custom CDF file and the corresponding Affymetrix CDF file; 2) a grouping file that can be used to find all targets (exons, transcripts) represented by the same probe set. The grouping file also contains the probe set spanning range on genome or transcripts to facilitate RT-PCR primer design.”
Cytoscape:"Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web."
Dapple:                “Dapple is a program for quantitating spots on a two-color DNA microarray image. Given a pair of images from a comparative hybridization, Dapple finds the individual spots on the image, evaluates their qualities, and quantifies their total fluorescent intensities.”
DAVID: “DAVID now provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.”
DIP:“The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.”
EMBOSS:Includes “sequence alignment,” “rapid database searching with sequence patterns,” “protein motif identification, including domain analysis,” “nucleotide sequence pattern analysis,” “codon usage analysis for small genomes,” “rapid identification of sequence patterns in large scale sequence sets,” and “presentation tools for publication.”
Ensembl genome browser:“The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.”
ERANGE:“Python package for doing RNA-seq and ChIP-seq.”
Exonerate:“Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics.”
FASTA and SSEARCH:“This tool provides sequence similarity searching against protein databases using the FASTA suite of programs. FASTA provides a heuristic search with a protein query. FASTX and FASTY translate a DNA query. Optimal searches are available with SSEARCH (local), GGSEARCH (global) and GLSEARCH (global query, local database).”
FastQC:"FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis."
FindPeaks:“…a Peak Finder/ Analysis application for ChIP-Seq or RNA-Seq experiments.”
FootPrinter:“A program for phylogenetic footprinting. Phylogenetic footprinting is a method that identifies putative regulatory elements in DNA sequences. It identifies regions of DNA that are unusually well conserved across a set of orthologous sequences.”
Galaxy: "Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses."
GE WorkbenchL:"geWorkbench (previously caWorkbench) is a Java-based open-source platform for integrated genomics. Using a component architecture geWorkbench allows individually developed plug-ins to be configured into complex bioinformatics applications."
Gene2MeSh:“Find Medical Subject Headings (MeSH terms) enriched for a particular gene or genes enriched for a particular MeSH term.”
GeneCards:"GeneCards is a searchable, integrated database of human genes that provides comprehensive, updated, and user-friendly information on all known and predicted human genes."
GeneGo:“High quality biological systems content in context and analytical tools to accelerate scientific research with extensive ontologies."
GenePattern:“GenePattern is a powerful genomic analysis platform that provides access to hundreds of tools for gene expression analysis, proteomics, SNP analysis, flow cytometry, RNA-seq analysis, and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.”
GenMapp:“GenMAPP is a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes.”
Genomatix:“Genomatix offers solutions and services for the entire course of analysis from first level mapping to integrating multiple results with copious high-quality data background.”
GEO:“A public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.”
GO:“The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.”
Graph-ML:“GraphML is a comprehensive and easy-to-use file format for graphs. It consists of a language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data.”
Graphviz, Neato:“Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.”
HPeak: “…a hidden Markov model-based approach that can accurately pinpoint regions to where significantly more sequence reads map.”
HPRD:“The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data.”
HUGO:“Human Genome Organisation (HUGO) is the international organisation of scientists involved in human genetics.”
Hupo:"The Human Proteome Organisation (HUPO) is an international scientific organization representing and promoting proteomics through international cooperation and collaborations by fostering the development of new technologies, techniques and training."
HyperTree:“Hypertree is an open source project very similar to the HyperGraph project. As the name implies, Hypertree is restricted to hyperbolic trees.”
iHOP:“A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural way of accessing millions of PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource, bringing all advantages of the internet to scientific literature research.”
Illumina: “At Illumina, our goal is to apply innovative sequencing and array technologies to the analysis of genetic variation and function, making studies possible that were not even imaginable just a few years ago. These studies will help make the realization of personalized medicine possible...Illumina’s innovative sequencing and array-based solutions for genomic analysis serve as tools for disease research, drug discovery, and the development of molecular tests in clinical labs.”
Ingenuity Pathway Analysis:“Model, analyze, and understand the complex biological and chemical systems at the core of life science research with IPA...IPA helps you understand complex 'omics data at multiple levels by integrating data from a variety of experimental platforms and providing insight into the molecular and chemical interactions, cellular phenotypes, and disease processes of your system.”
IntAct:“IntAct provides a freely available, open source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.”
KEGG Pathway:                “KEGG PATHWAY is a collection of manually drawn pathway maps…representing our knowledge on the molecular interaction and reaction networks…”
LASTZ:“LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.”
LRpath: "LRpath performs gene set enrichment testing, an approach used to test for predefined biologically-relevant gene sets that contain more significant genes from an experimental dataset than expected by chance."
MACS:“MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction.”
Mappfinder:“MAPPFinder is an accessory program that works with GenMaPP and the annotations from the Gene Ontology (GO) Consortium to identify global biological trends in gene expression data.”
MAQ:“Maq stands for Mapping and Assembly with Quality. It builds assembly by mapping short reads to reference sequences.”
Mathematica:“Mathematica is renowned as the world's ultimate application for computations. But it's much more—it's the only development platform fully integrating computation into complete workflows, moving you seamlessly from initial ideas all the way to deployed individual or enterprise solutions.”
Matlab:                "MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications."
McPromoter:“McPromoter is a program aiming at the exact localization of eukaryotic RNA polymerase II transcription start sites.”
MEME: “Motif-based sequence analysis tools”
Metab2MeSH:  “Find Medical Subject Headings (MeSH terms) enriched for a particular compound or compounds enriched for a particular MeSH term.”
MIAME:“MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment.”
MiMI:“Find protein interactions, pathways, compounds, literature, and genes merged from multiple sources.”
MiMI Plugin for Cytoscape:“Visualize gene interaction networks based on merged interaction data from MiMI.”
MINT:“MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators.”
Mosaik:“MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT.”
MUSCLE:“MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds.”
NCBI BLAST: “The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.”
NetAffx:"The NetAffx™ Analysis Center enables researchers to correlate their GeneChip® array results with array design and annotation information."
NovoAlign:“Our primary product is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.”
Omnigraph:“The Omnigraph graph processor is an indispensable tool for pupils of all ages. The software enables you to draw lines like y=x+1 either by typing the equation or by just clicking the screen. Graphs and shapes can be transformed and manipulated on screen.”
Oncomine:“Oncomine™ Research Edition is a powerful web application that integrates and unifies high-throughput cancer profiling data across a large volume of cancer types, subtypes, and experiments so that target expression can be assessed online, in seconds. A premium upgrade adds functionality and allows users to search the Oncomine™ database for enrichment of user-defined gene signatures for additional insights into biology, regulation, pathways, drug responses, and patient populations.”
Panther Pathway:"PANTHER Pathway consists of over 176, primarily signaling, pathways, each with subfamilies and protein sequences mapped to individual pathway components...Pathway diagrams are interactive and include tools for visualizing gene expression data in the context of the diagrams."
PASS:“PASS has been developed with an innovative strategy to perform fast gapped and ungapped alignment onto a reference sequence…designed to handle huge amounts of short reads generated by ILLUMINA, SOLiD, and Roche-454 technology…The optimization of the internal data structure and a filter based on precomputed short-word alignments allow the program to skip false positives in the extension phase, thus reducing the execution time without loss of sensitivity."
Peptide Atlas:   “Peptide Atlas is a multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments.”
Phred, Phrap, Consed:  “The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base.” “phrap is a program for assembling shotgun DNA sequence data.” “Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap.”
Primer3:"Pick primers from a DNA sequence."
Protein Atlas:“The Human Protein Atlas portal is a publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines.”
ProteomeExchange:“The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination.”
PSI-Search:“PSI-Search combines the Smith-Waterman search algorithm with the PSI-BLAST profile construction strategy to find distantly related protein sequences.”
Pyrobayes:"PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences."
R:“R is a free software environment for statistical computing and graphics.”
Reactome:“Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.”
Reverse complement:“Reverse Complement converts a DNA sequence into its reverse, complement, or reverse-complement counterpart.”
RMAP:“RMAP is aimed to map accurately reads from the next-generation sequencing technology. RMAP can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated reads mapping.”
SAM:“Supervised learning software for genomic expression data mining.”
SBML:“The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes…SBML can represent many different classes of biological phenomena, including metabolic networks, cell-signaling pathways, regulatory networks, infectious diseases, and many others.”
SciMiner:“SciMiner is a dictionary- and rule-based biomedical literature mining and functional enrichment analysis tool developed by the Bioinformatics Program, University of Michigan, Ann Arbor.”
SCIRun:                “SCIRun is a Problem Solving Environment (PSE), for modeling, simulation and visualization of scientific problems.”
SeqMap:“SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from.”
SHRiMP:“SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem’s colourspace genomic representation.”
SignalP:“…predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sties and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.”
Sim4:“sim4 is a similarity-based tool for aligning an expressed DNA sequence (EST, cDNA, mRNA) with a genomic sequence for the gene. It also detects end matches when the two input sequences overlap at one end (i.e., the start of one sequence overlaps the end of the other).”
SliderII:                "High quality SNP calling using Illumina data at minimal coverage."
SNPs3D:“SNPs3D is a website which assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis.”
SOAP:“…consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder (SOAPindel), a structural variation scanner (SOAPsv), and a de novo short reads assembler (SOAPdenovo).”
T-Coffee:“T-Coffee is a multiple sequence alignment package. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee).”
TouchGraph:“TouchGraph Navigator lets you create interactive network visualizations of your data.”
tranSMART:"tranSMART is a knowledge management platform that enables scientists to develop and refine research hypotheses by investigating correlations between genetic and phenotypic data, and assessing their analytical results in the context of published literature and other work."
Treeview:“TreeView is a simple program for displaying phylogenies on Apple Macintosh and Windows PCs.” "TreeView provides a simple way to view the contents of a NEXUS, PHYLIP, Hennig86, Clustal, or other format tree file."
UCSC genome browser:                “This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.”
UCSD Signaling Gateway Molecule Pages:"The UCSD Signaling Gateway Molecule Pages provide essential information on over thousands of proteins involved in cellular signaling. Each Molecule Page contains regularly updated information derived from public data sources as well as sequence analysis, references and links to other databases."
UniProt:“The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.”
VAST Search:“VAST, short for Vector Alignment Search Tool, is a computer algorithm developed at NCBI and used to identify similar protein 3-dimensional structures by purely geometric criteria, and to identify distant homologs that cannot be recognized by sequence comparison.”
VISTA genome browser:“VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. There are two ways of using VISTA - you can submit your own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species.”
vmatch:“…a versatile software tool for efficiently solving large scale sequence matching tasks.” It has a “persistent index,” “alphabet independency,” “versatility,” and a “flexible input format.”
VTK (TK):“The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.”
ZOOM: “Using Zoom, zillions of short reads are mapped back to reference genomes, including post-analysis at unparalleled in speed, at full sensitivity.”

No comments:

Post a Comment

JavaScript Free Code