Question: Blast or Blat for multiple sequence alignment to get each respective sequence locations
0
gravatar for Azhar
10 months ago by
Azhar40
China
Azhar40 wrote:

I have almost 8000 small RNA sequences, i want to get their Top 20 possible locations using Blast or Blat, for each sequence. Is there any method or script which can be used, Kindly enlighten me

next-gen • 426 views
ADD COMMENTlink modified 10 months ago by maxime.policarpo50 • written 10 months ago by Azhar40

Check here for the difference between blast and blat, and see what suits your data. I would use (stand alone) blast in this case.

ADD REPLYlink modified 10 months ago • written 10 months ago by Benn6.6k
0
gravatar for maxime.policarpo
10 months ago by
France, Paris
maxime.policarpo50 wrote:

What is your final purpose ? I think you could use blast and only get the top 20 hits with the options max_target_seqs or max_hsps.

ADD COMMENTlink modified 10 months ago by RamRS21k • written 10 months ago by maxime.policarpo50

can you just guide me how to do that i never did that just like step wise make a fasta file for 20K seq and run the blast using above stated options then get the locations

I am confused about location, will i get the locations for each seq

ADD REPLYlink written 10 months ago by Azhar40
1

You should try searching yourself and come to us with specific problems, not a request for someone to hand-hold you through the entire task.

ADD REPLYlink written 10 months ago by RamRS21k

I will focus on your problem in two days, i am kinda busy right now sorry :/

ADD REPLYlink written 10 months ago by maxime.policarpo50

Are you ok with using bash commands ?

You should first start by installing blast on your computer (https://www.ncbi.nlm.nih.gov/books/NBK52640/)

Then i have no idea if you want the top 20 possible location on a draft assembly genome or on a scaffold ? The problem is that blast results are divided in two parts : The target sequences and the HSPs. A target sequences can have many HSPs :

scaffold :  =========================================================
Hsps :         ======                                                     ======                                                   ======

Anyway, you could run a blast :

blastn -query 20ksequence.fasta -db yourgenome.fasta > results_RNA_vs_genome.blastn

Then you could easily parse the blast results with your criteria (best evalues ? best target sequences ?) using the biopython module (I think many people have already asked question on how to parse results in the same way than you and the biopython cookbook is very good : http://biopython.org/DIST/docs/tutorial/Tutorial.html)

I hope I helped you a bit,

Maxime

ADD REPLYlink modified 10 months ago by RamRS21k • written 10 months ago by maxime.policarpo50

Yes I can use bash commands, and blast in linux. your response is informative and i have used blast before, actually i want to use blat for miRNA sequence to get locations for each sequence for hg19. But the i have list of miRNA sequences, according to my information i have to make fasta file for them, but i do not know what will be the output. My requirement is to get list of locations against my list of my query sequences as text file or excel file.

ADD REPLYlink written 10 months ago by Azhar40

I have list of miRNA sequences ... make fasta file for them

What format are your sequences in?

but I don't know what will be the output

Why don't you try running the tool on a subset of the data to test and check the output you get?

ADD REPLYlink written 10 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1686 users visited in the last hour