Question: Batch Blast Two Sequences And Output Sequence Alignment
0
gravatar for biolab
5.7 years ago by
biolab1.1k
biolab1.1k wrote:

Dear all,

I want to compare a number of sequences with another list of sequences. Each comparison is consisted of only two sequences. I make an example below.

FILE1

>gene1_species1
atgcatgc
>gene2_species1
tgcagcat
.........

FILE2

>gene1_species2
atgGatgc
>gene2_species2
tgcCgcGt
...........

I need to compare gene1 and gene2 between these two species. One foolish way is to merge FILE1 and FILE2, and then blast itself (this is what I can think and do). My bigger problem is: I need to output the sequence alignment for each gene (please see below), rather than tabular blast result (-m 8 option). How to achieve this analysis? Would you please give me some suggestions? THANK YOU VERY MUCH!

gene1_species1   atgcatgc
gene1_species2   atgGatgc

gene2_species1  tgcagcat
gene2_species2  tgcCgcGt
blast • 3.0k views
ADD COMMENTlink modified 5.7 years ago by PoGibas4.8k • written 5.7 years ago by biolab1.1k
2
gravatar for PoGibas
5.7 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

If you need to compare genes between two species, then:

  while read GENE; do
     blastn \
        -query <(grep -A1 $GENE Species1.fa) \
        -subject <(grep -A1 $GENE Species2.fa) \
        -out ${GENE}_OUT
  done < Gene_list

If you need specific output play around with -outfmt

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

Hi, Pgibas, I am not professional bioinformatician. I use your text in a script as below.

#!/bin/sh
while read GENE; do
     blastn \
        -query <(grep -A1 $GENE s1.fa) \
        -subject <(grep -A1 $GENE s2.fa) \
        -out ${GENE}_OUT
  done < Gene_list

But i get Gene_list No such file or directory message. Could you please give me some suggestions? THANKS.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biolab1.1k

Gene_list is a file containing listed genes (one per line): gene1 gene2 ...

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

Hi Pgibas, I am still having problems. The three files are listed below. s1.fa

>gene1
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
>gene2
atatatatatataatatatatatatatatatatatata

s2.fa

>gene3
atgcatgcatgcatgcatgcatgcatgtgcatgcatg
>gene4
atcgatatatatatattaatatatatatatatatatt

list

gene1
gene2
gene3
gene4

I save your code in the file blast.sh, when running sh blast.sh I got error message as below.

 ./bla.sh: line 4: syntax error near unexpected token `('
./bla.sh: line 4: `        -query <(grep -A1 $GENE s1.fa) \'

Could you please tell me the where is the problem? In addition, when running blastn -query s1.fa -subject s2.fa -out result I got result file which contains combination comparison of these four genes, but all are ***** No hits found *****. Why i got this output? Thank you for your time. Best Regards.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biolab1.1k

s1.fa is not in fasta format.
Fasta format should be:

>gene1  
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc

Gene_list should be:

gene1
gene2
gene3
...
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

My s1.fa has two sequences. I think it is fasta format. Thanks

ADD REPLYlink written 5.7 years ago by biolab1.1k

I am puzzled that you said s1.fa is not in fasta format. Could you please explain a little bit more? Thank you very much!

ADD REPLYlink written 5.7 years ago by biolab1.1k

Before you posted s1.fa as gene1 atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc and it's not a fasta file type.

OK,this is what I did and it works for me.

cat s1.fa
>gene1
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
>gene2
atatatatatataatatatatatatatatatatatata

cat s2.fa
>gene1
atgcatgcatgcatgcatgcatgcatgtgcatgcatg
>gene2
atcgatatatatatattaatatatatatatatatatt

cat Gene_list
gene1
gene2

while read GENE; do 
    blastn \
        -task blastn-short \
        -query <(grep -A1 $GENE s1.fa) \
        -subject <(grep -A1 $GENE s2.fa) \
        -dust no \
        -out ${GENE}_OUT 
done < Gene_list
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

Hi Pgibas, I still have the problem. I run sh bla.sh and get the same error. Not sure of what's wrong. I need to learn more of the bioinformatics. If you have any good ideas, please feel free to let me know. Anyway, thank you very much for your answers. Best!

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biolab1.1k

Have in mind that in s1.fa and s2.fa fasta headers are identical. Also, don't run it as sh bla.sh just paste from loop directly into the terminal.

ADD REPLYlink written 5.7 years ago by PoGibas4.8k

My command is while read GENE; do blastn \ -task blastn-short \ -query < (grep -A1 $GENE s1.fa) \ -subject < (grep -A1 $GENE s2.fa} \ -dust no -out ${GENE}_OUT done < list The error message is -bash: syntax error near unexpected token('` My system's problem?

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biolab1.1k

NOTE I have made the s1.fa and s2.fa have the same headers, but problem still exists.

ADD REPLYlink written 5.7 years ago by biolab1.1k
1

Please paste only this:
while read GENE; do blastn -task blastn-short -query <(grep -A1 $GENE s1.fa) -subject <(grep -A1 $GENE s2.fa) -dust no -out ${GENE}_OUT ; done < Gene_list

ADD REPLYlink written 5.7 years ago by PoGibas4.8k

Hi Pgibas,It's great. Your command works very well. Thank you very much!

ADD REPLYlink written 5.7 years ago by biolab1.1k
1

Have in mind that blastn-short is "optimized for sequences shorter than 50 bases" (BLAST manual).

PS.: If it works accept the answer.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

Ok. Pgibas, I have clicked the green tick (I suppose it is to accept the answer) and also upvoted the answer. I am new in Biostars. I believe it is reasonable to upvote those whose give others helps! thanks!

ADD REPLYlink written 5.7 years ago by biolab1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 729 users visited in the last hour