Extract subject sequences from Blast result
1
0
Entering edit mode
2.7 years ago
mnsp088 ▴ 100

Hi all,

I ran TBLASTN (query=protein, subject=nucleotide) and got the following result:

qseqid       sseqid     pident      sstart  send    evalue  
MP_11765    contig_17   95.322      5460    5221    0

I noticed that sstart > ssend meaning this query is mapping on the reverse complement of the subject. In this case how can I pull out the DNA sequence from my subject nucleotide file that corresponds to this hit? Any ideas on a program that can do this?

Thank you.

sequence blast homology • 1.5k views
ADD COMMENT
3
Entering edit mode
2.7 years ago
Nitin Narwade ★ 1.6k

bedtools getfasta utility may help you. You need to include strand information in your blast output and use -s option of bedtools getfasta.

ADD COMMENT
0
Entering edit mode

I tried pulling out strand info by adding ''sstrand" in my tblastn command below however this just results in a column of N/As

tblastn -query prot_seqs.txt -db my_nuc_db -outfmt "6 qseqid sseqid pident length qlen qstart qend sstart send evalue sstrand sseq" 
ADD REPLY
1
Entering edit mode

Hmmmmmm!!

I am not sure why this is happening but you can add strand information manually for the blast output. Simple awk should work. Something like this,

awk -F'\t' '{ if($4 > $5) {print $2"\t"$4"\t"$5"\t-"} else{print $2"\t"$4"\t"$5"\t+"} }' blast.output.tsv >subject-with-strand-info.bed
ADD REPLY
0
Entering edit mode

I didn't think about adding them in manually but this works! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6