Question

BLAST to find orthologs

0

Entering edit mode

7.1 years ago

alon.shtrikman • 0

Hi i am doing a bioinformatics project and I need some help

I am trying to find homology\orthology between viral and human miRNAs. I have two fasta files - one containing human miRNAs and their sequences(lets call it X - contains about 200 mirs) and one containing viral miRNAS and their sequences (lets call it Y - contains about 1000 mirs). I tried using blastn to find homology so i check the "Align two or more sequences" box and then i can give the program two files. Query Sequence is Y and the Subject Sequence is X. Then i select "somewhat similar sequences" and BLAST. For some reason i only get 100% similar sequences (which are really short) and it doesnt allow gaps or mismatches! i tried changing the parameters but with no luck.

Can someone help me please? what parameters should i chose? or am i using BLAST for the wrong thing? What I am trying to achieve is something like this: http://www.nature.com/ni/journal/v14/n3/images/ni.2537-F2.jpg The program just shows me the parts that are 100% the same (just like the seed) but doesnt show me the rest.

Maybe there is a better way to do so using python? but i got a lot of seqs... so it will be really slow probably. no?

miRNA BLAST orthologs python • 4.2k views

ADD COMMENT • link 7.1 years ago by alon.shtrikman • 0

0

Entering edit mode

Firstly you don't want blast2seq because you have more than 2. That is for literally 2 sequences, not sets of sequences. Secondly, I think you should be stricter in defining what you're looking for - just because 2 gene sequences share similarity ("identity"), that doesn't necessarily make them homologous or orthologous. These 2 terms have very specific meanings, which I'm not sure is what you're after based on your post.

If you just want to find regions where all your sequences are the same, you probably want to be performing a normal alignment (see clustalo, mega, muscle etc.). It's also not clear whether you want to compare all 200 vs 1000 miRNAs though? Or just specific combinations?

ADD REPLY • link 7.1 years ago by Joe 21k

score 0 · Answer 1 · 2017-03-20

0

Entering edit mode

7.1 years ago

alon.shtrikman • 0

i do want to compare all 200 vs 1000. i know it doesn't neceseraly finds homology, for now i just need to find similarity... and all the tools you gave me do not have an option to compare sequences between two separate files, they just get one file input. is there a tool to do what i need?

ADD COMMENT • link 7.1 years ago by alon.shtrikman • 0

0

Entering edit mode

You want 200 * 1000 (= 200,000) alignments, how do you want to interpret these results?

In the EMBOSS command line tool you can align your file X vs Y with e.g., smith-waterman (water). But then you'll have to interpret your 200,000 results, good luck with that!

Edit: btw you can also use stand alone blast if you really prefer blast. You'll have to make a db of one of your files first.

ADD REPLY • link 7.1 years ago by Benn 8.3k

0

Entering edit mode

Command line blast or blastall is an option, though possibly not very fast. Don't forget that, without filtering and strict cutoff, you're going to get multiple hits per query-subject pairing. So while you have 200,000 combinations, you will almost certainly get considerably more hits back under default behaviour. Probably one or two orders of magnitude difference, so you really need to think about how you're going to analyse this

ADD REPLY • link 7.1 years ago by Joe 21k