Best Reference For Small Rna Alignment And Annotation
3
4
Entering edit mode
11.0 years ago
Doctoroots ▴ 800

from a wet lab experiment, an unknown small RNA was detected in a sample. the sample was later filtered for the RNA approximate size and sequenced to try and find out what it is. im now working on this sequence data.

i wanted to ask what you suggest is the best reference to use when aligning and annotating this read data?

i was thinking of a couple of options:

  • align to whole genome reference and then annotate the regions with most aligned reads.
  • align to ncRNA and see if i get lucky and one of them is the unknown RNA.
  • use biomart and get all the unspliced genes sequences with my reads base numbers flank
  • use biomart and get all the unspliced transcripts sequences, align to it and then see what transcript is most abundant

any thoughts or suggestions?

small alignment annotation reference • 3.5k views
ADD COMMENT
3
Entering edit mode
11.0 years ago

Did you try to translate it? Do you know whether it is a coding mRNA or another type of transcript?

The best way to align a mRNA to a genome is to use Blat, which has models to take into account introns and splicing signals. You can also do it through exonerate; you may have a look at this other question.

You can try to align the RNA to other known RNAs; the best is to use a software designed for this, RNA aligners rely more on the secondary structure that on the RNA sequence.

ADD COMMENT
0
Entering edit mode

hi giovanni - i didnt translate it, since i dont know which of the reads align to the unknown RNA. this is the same for using blat, i can check the most common unique reads on blat, but for all the reads, it seems cumbersome. about the RNA aligners, since im dealing with small RNA, shouldnt i treat it like i treat miRNA (regular alignment to a reference sequence)?

ADD REPLY
0
Entering edit mode

how short it is? Is it very short and you are sure it is not an artifact due to a restriction in the technique, then it is unlikely to be a coding sequence.

ADD REPLY
0
Entering edit mode

the segment is around 70bp long, the lab that ordered this analysis doesnt believe it is a coding sequence.

ADD REPLY
0
Entering edit mode

Do you have one 70bp sequence and want to determine what it is likely to be? How about a BLAST search? http://blast.ncbi.nlm.nih.gov/Blast.cgi

ADD REPLY
2
Entering edit mode
11.0 years ago

I'm not certain I would try Blat if the sequence in question is say 40 bp or smaller. Similarity search algorithms that accelerate the finding of a match sacrifice sensitivity for short queries.

My inclination is to align to the genome and annotate those regions that show perfect to nearly perfect matches - I'd collect nearly perfect because I'm not sure of the quality of the sequence data.

That said, it is not clear from your question how many queries you have. Is this a case of one small RNA or do you have thousands of reads that assemble into dozens or hundreds of distinct small RNA species?

In the end, it will be important to be very certain that the species you've sequenced are either from non-coding RNA genes or introns of protein-coding genes or something else.

ADD COMMENT
0
Entering edit mode

the queries are from illumina run sequence data, so there are millions. and ofcourse i dont know if all of them are taken from the specific RNA of interest.

ADD REPLY
2
Entering edit mode
11.0 years ago
  • Align to the genome and see if it overlaps a gene, ORF or if it is intronic or intergenic using a sensitive alignment method (e.g. BLAT, LASTZ, FASTA with short word sizes)
  • Blast against NT and find similarities in other related genomes
  • Search RNA databases, e.g Rfam (http://rfam.janelia.org/)
  • look here: Identified Potential Non-Coding Rna, And Then?
ADD COMMENT

Login before adding your answer.

Traffic: 2286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6