Question: Getting untranslated nucleotide sequences on tblastn standalone
gravatar for vicks
3.0 years ago by
United Kingdom
vicks10 wrote:

Short version:

Does anybody know how to get untranslated DNA sequences from tblastn in local mode?

Long version:

This is trivial to do online at NCBI, you just select several target sequences and download them, and what you get are the nucleotide sequences.

However, when running tblastn in standalone mode the output is always translated, and I can't find a way of getting the nucleotide sequences directly.

Somebody asked a similar question here, but I don't think it ever got answered (one of the answers claimed it was a problem with the XML output, but indeed I find it with every output I've tried: they either have protein sequences, or no sequences at all).

Naturally, I could extract the nucleotide sequences one by one from the identifier, start, end, and frame, but one would think there should be an easier (automatic) way.

blast tblastn • 1.4k views
ADD COMMENTlink modified 3.0 years ago by Pierre Lindenbaum113k • written 3.0 years ago by vicks10
gravatar for Pierre Lindenbaum
3.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

I quickly wrote a tool to retrieve the DNA sequence from a BLAST+XML output: (note that I didn't pay attention to the correct position for the junctions where there is a gap )

cat roxan.fa | ${bin.dir}/tblastn -db blastdb -outfmt 5 | java -jar biostar160470.jar -p ${bin.dir} -d blastdb| xmllint --format - 




                <Hsp_midline>+G + LC  +   + C  GD C  A+ QEE++ W + R+          +D+L  P</Hsp_midline>
                <Hsp_midline>H+A  + L P    P HR  V       QPP ++P  P LP</Hsp_midline>
ADD COMMENTlink written 3.0 years ago by Pierre Lindenbaum113k
gravatar for 5heikki
3.0 years ago by
5heikki7.8k wrote:

You could parse identifier, range and strand info into a file and then use blastdbcmd?


 -entry_batch <File_In>
   Input file for batch processing (Format: one entry per line, seq id
   followed by optional space-delimited specifier(s)

 -outfmt <String>
   Output format, where the available format specifiers are:
           %f means sequence in FASTA format
           %s means sequence data (without defline)
           %a means accession
           %g means gi
           %o means ordinal id (OID)
           %i means sequence id
           %t means sequence title
           %l means sequence length
           %h means sequence hash value
           %T means taxid
           %e means membership integer
           %L means common taxonomic name
           %S means scientific name
           %P means PIG
           %m means sequence masking data.
              Masking data will be displayed as a series of 'N-M' values
              separated by ';' or the word 'none' if none are available.
       If '%f' is specified, all other format specifiers are ignored.
       For every format except '%f', each line of output will correspond
       to a sequence.
   Default = `%f'


ADD COMMENTlink written 3.0 years ago by 5heikki7.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour