Hi i am doing a bioinformatics project and I need some help
I am trying to find homology\orthology between viral and human miRNAs. I have two fasta files - one containing human miRNAs and their sequences(lets call it X - contains about 200 mirs) and one containing viral miRNAS and their sequences (lets call it Y - contains about 1000 mirs). I tried using blastn to find homology so i check the "Align two or more sequences" box and then i can give the program two files. Query Sequence is Y and the Subject Sequence is X. Then i select "somewhat similar sequences" and BLAST. For some reason i only get 100% similar sequences (which are really short) and it doesnt allow gaps or mismatches! i tried changing the parameters but with no luck.
Can someone help me please? what parameters should i chose? or am i using BLAST for the wrong thing? What I am trying to achieve is something like this: http://www.nature.com/ni/journal/v14/n3/images/ni.2537-F2.jpg The program just shows me the parts that are 100% the same (just like the seed) but doesnt show me the rest.
Maybe there is a better way to do so using python? but i got a lot of seqs... so it will be really slow probably. no?
Firstly you don't want
blast2seq
because you have more than 2. That is for literally 2 sequences, not sets of sequences. Secondly, I think you should be stricter in defining what you're looking for - just because 2 gene sequences share similarity ("identity"), that doesn't necessarily make them homologous or orthologous. These 2 terms have very specific meanings, which I'm not sure is what you're after based on your post.If you just want to find regions where all your sequences are the same, you probably want to be performing a normal alignment (see
clustalo
,mega
,muscle
etc.). It's also not clear whether you want to compare all 200 vs 1000 miRNAs though? Or just specific combinations?