Batch Blast Two Sequences And Output Sequence Alignment
1
0
Entering edit mode
10.4 years ago
biolab ★ 1.4k

Dear all,

I want to compare a number of sequences with another list of sequences. Each comparison is consisted of only two sequences. I make an example below.

FILE1

>gene1_species1
atgcatgc
>gene2_species1
tgcagcat
.........

FILE2

>gene1_species2
atgGatgc
>gene2_species2
tgcCgcGt
...........

I need to compare gene1 and gene2 between these two species. One foolish way is to merge FILE1 and FILE2, and then blast itself (this is what I can think and do). My bigger problem is: I need to output the sequence alignment for each gene (please see below), rather than tabular blast result (-m 8 option). How to achieve this analysis? Would you please give me some suggestions? THANK YOU VERY MUCH!

gene1_species1   atgcatgc
gene1_species2   atgGatgc

gene2_species1  tgcagcat
gene2_species2  tgcCgcGt
blast • 4.7k views
ADD COMMENT
2
Entering edit mode
10.4 years ago
PoGibas 5.1k

If you need to compare genes between two species, then:

  while read GENE; do
     blastn \
        -query <(grep -A1 $GENE Species1.fa) \
        -subject <(grep -A1 $GENE Species2.fa) \
        -out ${GENE}_OUT
  done < Gene_list

If you need specific output play around with -outfmt

ADD COMMENT
0
Entering edit mode

Hi, Pgibas, I am not professional bioinformatician. I use your text in a script as below.

#!/bin/sh
while read GENE; do
     blastn \
        -query <(grep -A1 $GENE s1.fa) \
        -subject <(grep -A1 $GENE s2.fa) \
        -out ${GENE}_OUT
  done < Gene_list

But i get Gene_list No such file or directory message. Could you please give me some suggestions? THANKS.

ADD REPLY
0
Entering edit mode

Gene_list is a file containing listed genes (one per line): gene1 gene2 ...

ADD REPLY
0
Entering edit mode

Hi Pgibas, I am still having problems. The three files are listed below. s1.fa

>gene1
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
>gene2
atatatatatataatatatatatatatatatatatata

s2.fa

>gene3
atgcatgcatgcatgcatgcatgcatgtgcatgcatg
>gene4
atcgatatatatatattaatatatatatatatatatt

list

gene1
gene2
gene3
gene4

I save your code in the file blast.sh, when running sh blast.sh I got error message as below.

 ./bla.sh: line 4: syntax error near unexpected token `('
./bla.sh: line 4: `        -query <(grep -A1 $GENE s1.fa) \'

Could you please tell me the where is the problem? In addition, when running blastn -query s1.fa -subject s2.fa -out result I got result file which contains combination comparison of these four genes, but all are ***** No hits found *****. Why i got this output? Thank you for your time. Best Regards.

ADD REPLY
0
Entering edit mode

s1.fa is not in fasta format.
Fasta format should be:

>gene1  
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc

Gene_list should be:

gene1
gene2
gene3
...
ADD REPLY
0
Entering edit mode

My s1.fa has two sequences. I think it is fasta format. Thanks

ADD REPLY
0
Entering edit mode

I am puzzled that you said s1.fa is not in fasta format. Could you please explain a little bit more? Thank you very much!

ADD REPLY
0
Entering edit mode

Before you posted s1.fa as gene1 atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc and it's not a fasta file type.

OK,this is what I did and it works for me.

cat s1.fa
>gene1
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
>gene2
atatatatatataatatatatatatatatatatatata

cat s2.fa
>gene1
atgcatgcatgcatgcatgcatgcatgtgcatgcatg
>gene2
atcgatatatatatattaatatatatatatatatatt

cat Gene_list
gene1
gene2

while read GENE; do 
    blastn \
        -task blastn-short \
        -query <(grep -A1 $GENE s1.fa) \
        -subject <(grep -A1 $GENE s2.fa) \
        -dust no \
        -out ${GENE}_OUT 
done < Gene_list
ADD REPLY
0
Entering edit mode

Hi Pgibas, I still have the problem. I run sh bla.sh and get the same error. Not sure of what's wrong. I need to learn more of the bioinformatics. If you have any good ideas, please feel free to let me know. Anyway, thank you very much for your answers. Best!

ADD REPLY
0
Entering edit mode

Have in mind that in s1.fa and s2.fa fasta headers are identical. Also, don't run it as sh bla.sh just paste from loop directly into the terminal.

ADD REPLY
0
Entering edit mode

My command is while read GENE; do blastn \ -task blastn-short \ -query < (grep -A1 $GENE s1.fa) \ -subject < (grep -A1 $GENE s2.fa} \ -dust no -out ${GENE}_OUT done < list The error message is -bash: syntax error near unexpected token('` My system's problem?

ADD REPLY
0
Entering edit mode

NOTE I have made the s1.fa and s2.fa have the same headers, but problem still exists.

ADD REPLY
1
Entering edit mode

Please paste only this:
while read GENE; do blastn -task blastn-short -query <(grep -A1 $GENE s1.fa) -subject <(grep -A1 $GENE s2.fa) -dust no -out ${GENE}_OUT ; done < Gene_list

ADD REPLY
0
Entering edit mode

Hi Pgibas,It's great. Your command works very well. Thank you very much!

ADD REPLY
1
Entering edit mode

Have in mind that blastn-short is "optimized for sequences shorter than 50 bases" (BLAST manual).

PS.: If it works accept the answer.

ADD REPLY
0
Entering edit mode

Ok. Pgibas, I have clicked the green tick (I suppose it is to accept the answer) and also upvoted the answer. I am new in Biostars. I believe it is reasonable to upvote those whose give others helps! thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6