Dear all,
I want to compare a number of sequences with another list of sequences. Each comparison is consisted of only two sequences. I make an example below.
FILE1
>gene1_species1
atgcatgc
>gene2_species1
tgcagcat
.........
FILE2
>gene1_species2
atgGatgc
>gene2_species2
tgcCgcGt
...........
I need to compare gene1 and gene2 between these two species. One foolish way is to merge FILE1 and FILE2, and then blast itself (this is what I can think and do). My bigger problem is: I need to output the sequence alignment for each gene (please see below), rather than tabular blast result (-m 8 option). How to achieve this analysis? Would you please give me some suggestions? THANK YOU VERY MUCH!
gene1_species1 atgcatgc
gene1_species2 atgGatgc
gene2_species1 tgcagcat
gene2_species2 tgcCgcGt
Hi, Pgibas, I am not professional bioinformatician. I use your text in a script as below.
But i get
Gene_list No such file or directory
message. Could you please give me some suggestions? THANKS.Gene_list
is a file containing listed genes (one per line):gene1 gene2 ...
Hi Pgibas, I am still having problems. The three files are listed below. s1.fa
s2.fa
list
I save your code in the file blast.sh, when running
sh blast.sh
I got error message as below.Could you please tell me the where is the problem? In addition, when running
blastn -query s1.fa -subject s2.fa -out result
I got result file which contains combination comparison of these four genes, but all are***** No hits found *****.
Why i got this output? Thank you for your time. Best Regards.s1.fa
is not in fasta format.Fasta format should be:
Gene_list
should be:My s1.fa has two sequences. I think it is fasta format. Thanks
I am puzzled that you said s1.fa is not in fasta format. Could you please explain a little bit more? Thank you very much!
Before you posted
s1.fa
asgene1 atgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
and it's not a fasta file type.OK,this is what I did and it works for me.
Hi Pgibas, I still have the problem. I run
sh bla.sh
and get the same error. Not sure of what's wrong. I need to learn more of the bioinformatics. If you have any good ideas, please feel free to let me know. Anyway, thank you very much for your answers. Best!Have in mind that in
s1.fa
ands2.fa
fasta headers are identical. Also, don't run it assh bla.sh
just paste from loop directly into the terminal.My command is
while read GENE; do blastn \ -task blastn-short \ -query < (grep -A1 $GENE s1.fa) \ -subject < (grep -A1 $GENE s2.fa} \ -dust no -out ${GENE}_OUT done < list
The error message is-bash: syntax error near unexpected token
('` My system's problem?NOTE I have made the s1.fa and s2.fa have the same headers, but problem still exists.
Please paste only this:
while read GENE; do blastn -task blastn-short -query <(grep -A1 $GENE s1.fa) -subject <(grep -A1 $GENE s2.fa) -dust no -out ${GENE}_OUT ; done < Gene_list
Hi Pgibas,It's great. Your command works very well. Thank you very much!
Have in mind that
blastn-short
is "optimized for sequences shorter than 50 bases" (BLAST manual).PS.: If it works accept the answer.
Ok. Pgibas, I have clicked the green tick (I suppose it is to accept the answer) and also upvoted the answer. I am new in Biostars. I believe it is reasonable to upvote those whose give others helps! thanks!