Snrna And Snorna Annotations
2
0
Entering edit mode
10.3 years ago

Hi All,

I am analyzing sRNA seq data from mouse sRNA (Illumina). I have completed the quality checks, pre-processing and identified differentially expressed sRNAs from my seq data. I would like to annotate the differentially expressed snRNA and snoRNA from the list of differentially expressed sRNA. To do so I downloaded annotations from useast.enseml.org (http://useast.ensembl.org/info/data/ftp/index.html) but the length of the snRNA and snoRNA are 60bp to 150 bp, which is longer than maximum read length i.e 34bp of identified sRNA. Can anybody suggest is there any way I could find and download mature snRNA and snoRNA sequences which are not longer than 34bp and can be used for snRNA and snoRNA annotations?

Thanks

Anand

• 4.4k views
ADD COMMENT
1
Entering edit mode
10.3 years ago

Perhaps I am misunderstanding, but typically one aligns reads to a set of known "reference" sequences. Your data consist of many 34bp reads, presumably derived from a set of sRNAs in your samples. These reads do not typically represent full-length molecules from the sample (except for miRNAs and other very short species). Instead, they represent only a 34-bp segment of the longer biological molecule. So, your goal is not to identify a source of sRNA sequences that are 34-bp or shorter (there may not even be any), but rather to assign biologically interesting information to the reference sequences which you used for identifying differentially-expressed sRNAs. If this is not making some sense, then perhaps you could expand your question with more detail.

ADD COMMENT
1
Entering edit mode
10.3 years ago
IV ★ 1.3k

In brief there are two approaches (and hybrids):

  1. You utilize data from available resources (see below) and annotate features in your genome (snRNAs, snoRNAs, repeats, etc). You then count the number of reads that map within the boundaries of each feature (in our case a snoRNA or a snRNA). Don't worry about the difference of read size and feature length. Protein coding genes can be thousands of bases long but we utilize much smaller reads to estimate expression.

  2. Another approach is to create databases of smallRNAs and repeats (using data from ensembl, RFam, RepeatMasker, snoRNABase, etc).and then align or blast your reads against those.

The hybrid approaches can be a mix of those two or other combinations (e.g. utilize sequences from databases and the align against the genome in order to annotate features; do the database approach only on snRNAs and snoRNAs and then align the remaining reads against the genome, etc).

Ofcourse there are also multiple tools that can do this for you but I haven't tried any, in order to suggest you one from personal experience.

Cheers,

IV

ADD COMMENT

Login before adding your answer.

Traffic: 2901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6