Blast or Blat for multiple sequence alignment to get each respective sequence locations
1
0
Entering edit mode
5.9 years ago
Azhar ▴ 50

I have almost 8000 small RNA sequences, i want to get their Top 20 possible locations using Blast or Blat, for each sequence. Is there any method or script which can be used, Kindly enlighten me

next-gen • 2.2k views
ADD COMMENT
0
Entering edit mode

Check here for the difference between blast and blat, and see what suits your data. I would use (stand alone) blast in this case.

ADD REPLY
0
Entering edit mode
5.9 years ago

What is your final purpose ? I think you could use blast and only get the top 20 hits with the options max_target_seqs or max_hsps.

ADD COMMENT
0
Entering edit mode

can you just guide me how to do that i never did that just like step wise make a fasta file for 20K seq and run the blast using above stated options then get the locations

I am confused about location, will i get the locations for each seq

ADD REPLY
1
Entering edit mode

You should try searching yourself and come to us with specific problems, not a request for someone to hand-hold you through the entire task.

ADD REPLY
0
Entering edit mode

I will focus on your problem in two days, i am kinda busy right now sorry :/

ADD REPLY
0
Entering edit mode

Are you ok with using bash commands ?

You should first start by installing blast on your computer (https://www.ncbi.nlm.nih.gov/books/NBK52640/)

Then i have no idea if you want the top 20 possible location on a draft assembly genome or on a scaffold ? The problem is that blast results are divided in two parts : The target sequences and the HSPs. A target sequences can have many HSPs :

scaffold :  =========================================================
Hsps :         ======                                                     ======                                                   ======

Anyway, you could run a blast :

blastn -query 20ksequence.fasta -db yourgenome.fasta > results_RNA_vs_genome.blastn

Then you could easily parse the blast results with your criteria (best evalues ? best target sequences ?) using the biopython module (I think many people have already asked question on how to parse results in the same way than you and the biopython cookbook is very good : http://biopython.org/DIST/docs/tutorial/Tutorial.html)

I hope I helped you a bit,

Maxime

ADD REPLY
0
Entering edit mode

Yes I can use bash commands, and blast in linux. your response is informative and i have used blast before, actually i want to use blat for miRNA sequence to get locations for each sequence for hg19. But the i have list of miRNA sequences, according to my information i have to make fasta file for them, but i do not know what will be the output. My requirement is to get list of locations against my list of my query sequences as text file or excel file.

ADD REPLY
0
Entering edit mode

I have list of miRNA sequences ... make fasta file for them

What format are your sequences in?

but I don't know what will be the output

Why don't you try running the tool on a subset of the data to test and check the output you get?

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6