BL!P [blip], or
Pivot, is a computer program that automates the NCBI BLAST alignment of coding DNA or protein sequences and processes the results for visualization in the Microsoft Live Labs program Pivot.
27/04/2011: UPDATE RECOMMENDED Fixed image caching + minor edits.
20/04/2011: Fixed Rank issue that caused inconsistent output.
14/04/2011: Properly handle large GenBank nucleotide records.
13/04/2011: Added export to text functionality.
- Microsoft Windows XP or better.
is a popular software program used to find regions of similarity between biological sequences, and can be used to infer functional and evolutionary relationships between sequences. A NCBI BLAST
search using multiple query sequences (e.g. gene predictions from a genome sequencing project) typically generates a large dataset that must be explored for functional or evolutionary patterns on interest. Current approaches to exploring NCBI BLAST results
include automated filtering of the dataset using a priori significance thresholds followed by manual inspection. While this approach is satisfactory, novel data exploration and visualization software exists that allows for patterns to be identified more easily
and with less bias. One such program is
, which can visualize the relationship between pieces of information allowing for the discovery of hidden patterns. Pivot structures its data into “collections”, which combines groups of similar items based on values of certain attributes
(facet categories), and represents each item using an image. We have created a software application, BL!P, that automates the NCBI BLAST search of multiple biological sequences and converts the results into a Pivot collection. BL!P also provides an interface
to construct custom image layouts for the collection of Pivot items.
BL!P was developed using C# and .NET 4.0, and uses the
Microsoft Biology Foundation
(MBF) bioinformatics toolkit to access NCBI resources such as NCBI BLAST and
, as well as parsers to read/write biological sequence data.
BL!P automatically submits multiple FASTA formatted coding DNA or amino acid sequences to a NCBI BLAST protein database. Submissions are polled until complete, and the results are saved to disk for later use. Upon completion of the NCBI BLAST search, the GenBank
records for each BLAST hit that meets user specified criteria is downloaded and saved to disk for later use. The results from BLAST and information in the GenBank records are parsed and converted to a Pivot collection. Using data from the Pivot collection,
a custom image layout is constructed to represent each BLAST hit. The results are saved to disk and can be loaded into Pivot for exploration. BL!P is a member of the
Microsoft Biology Initiative
About the Author
Vince Forgetta, M.Sc.
Microsoft Intern: Summer 2010
Ph.D. Candidate, McGill University, Department of Human Genetics, Montreal, Quebec, Canada