.:: Extra BLAST::.

Download

Extra BLAST version 0.4

Description

Extra BLAST is a perl script which appends a column to a given tabular NCBI Blast output, with info about the subject entry. For example, turns this:
Contig1 gi|110083942|gb|DQ822993.1|     90.74   497     46      0       179     675     141     637     7e-174   620
Contig1 gi|110631513|gb|DQ784686.1|     88.89   495     55      0       181     675     136     630     3e-151   545
Contig1 gi|109900625|gb|DQ640309.1|     88.89   495     55      0       181     675     136     630     3e-151   545
Contig3 gi|124222179|dbj|AK224773.2|    93.37   392     17      2       82      472     1       384     1e-158   569
Contig3 gi|164508136|emb|AM412177.1|    100.00  89      0       0       539     627     1871    1959    2e-40    176
Contig3 gi|161085627|dbj|AB305024.1|    100.00  89      0       0       539     627     89      1       2e-40    176
Contig4 gi|1679608|emb|X62395.1|NTLTP1  87.32   142     18      0       277     418     1541    1682    5e-29    139
Into this:
Contig1 gi|110083942|gb|DQ822993.1|     90.74   497     46      0       179     675     141     637     7e-174   620    Solanum phureja pectin methylesterase inhibitor isoform mRNA, complete cds.
Contig1 gi|110631513|gb|DQ784686.1|     88.89   495     55      0       181     675     136     630     3e-151   545    Capsicum annuum cultivar Hanbyul pectin methlyesterase inhibitor protein 1 (PMEI1) gene, complete cds.
Contig1 gi|109900625|gb|DQ640309.1|     88.89   495     55      0       181     675     136     630     3e-151   545    Capsicum annuum pectin methlyesterase inhibitor protein 1 (PMEI1) mRNA, complete cds.
Contig3 gi|124222179|dbj|AK224773.2|    93.37   392     17      2       82      472     1       384     1e-158   569    Solanum lycopersicum cDNA, clone: FC14BG04, HTC in fruit.
Contig3 gi|164508136|emb|AM412177.1|    100.00  89      0       0       539     627     1871    1959    2e-40    176    Phytophthora cinnamomi tub1 gene for alpha-tubulin, strain Pr120.
Contig3 gi|161085627|dbj|AB305024.1|    100.00  89      0       0       539     627     89      1       2e-40    176    Vaccinium ciliatum DNA, microsatellite marker VM32.
Contig4 gi|1679608|emb|X62395.1|NTLTP1  87.32   142     18      0       277     418     1541    1682    5e-29    139    N.tabacum ltp1 gene for lipid transferase.
Or even this (sequences wraped):
Contig1 gi|110083942|gb|DQ822993.1|     90.74   497     46      0       179     675     141     637     7e-174   620    ggatcacactcaactagctttagttattgagaaacaaaac...
Contig1 gi|110631513|gb|DQ784686.1|     88.89   495     55      0       181     675     136     630     3e-151   545    gcacgaggaaagaattcattttttttaaaagaaaggctca...
Contig1 gi|109900625|gb|DQ640309.1|     88.89   495     55      0       181     675     136     630     3e-151   545    gcacgaggaaagaattcattttttttaaaagaaaggctca...
Contig3 gi|124222179|dbj|AK224773.2|    93.37   392     17      2       82      472     1       384     1e-158   569    tattcgggttgcagatggcggtgttgccgcgttcctcaac...
Contig3 gi|164508136|emb|AM412177.1|    100.00  89      0       0       539     627     1871    1959    2e-40    176    aggcaccagcattcttgttggccacaacttcaaggcatgg...
Contig3 gi|161085627|dbj|AB305024.1|    100.00  89      0       0       539     627     89      1       2e-40    176    tttaggtgacactatagaatactcaagctatgcatccaac...
Contig4 gi|1679608|emb|X62395.1|NTLTP1  87.32   142     18      0       277     418     1541    1682    5e-29    139    tgaacttattaaccttttgataacatgacgtcaacttaat...

Output

Extra BLAST creates the output file on a tabular format readable by most of spreadshets (e.g. Calc or Excel) and a folder (cache) containing the GenBank entries downloaded.

Usage

First you have to turn on the execution flag with:
$> chmod +x extra_blast
An then just execute it as:
$> ./extra_blast  BLAST_FILE  [FILE_OUT  [FIELD  [CACHE_DIR]]]
Where:
  • BLAST_FILE [file in]: The Blast output file on tabular format (-m 8 or -m 9).
  • FILE_OUT [file out]: The file with the extra column, "extra_out.csv" by default.
  • FIELD [str]: The field I must append to the blast tabular file, "DEFINITION" by default. The available fields are: DEFINITION, ACCESSION, VERSION, KEYWORDS, COMMENT and ORIGIN. Note that you can also use the fields SOURCE, REFERENCE and FEATURES but you can have some unespected behavior.
  • CACHE_DIR [dir out]: The folder where I store the cached entries, "cache" by default.

Examples

You can take a look into the examples folder on the Extra BLAST downloaded package. See bellow the command line used to generate each one. phureja.fasta is a multifasta not into the package.
$> # examples/blastnphureja_nr.csv
$> blastall -p blastn -i phureja.fasta -o examples/blastnphureja_nr.csv -d nr -m 8
$> # examples/blastnphureja_nr_definition.csv
$> ./extra_blast examples/blastnphureja_nr.csv examples/blastnphureja_nr_definition.csv
The examples above on the Description section are the first 7 lines of the files on the examples folder, except the last one that can be generated by:
$> ./extra_blast examples/blastnphureja_nr.csv examples/blastnphureja_nr_definition.csv SEQUENCE

Developer

Luis M. Rodriguez

Bogotá - Colombia
2007