Question: How Can I Accurately Identify Mirna Sequences From Small Rna Seq Results
6
gravatar for Ryan Thompson
5.1 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I have a dataset of small RNA-seq reads that have their 3-prime adapter sequence trimmed such that the remaining seqeunce precisely represents the bases present in the biological fragment that was sequenced. Many of these sequences represent mature miRNAs, and I want to count how many times each known miRNA sequence occurs in the dataset. Obviously I can perform an exact string match and get a good estimate, but that ignores the possibility of sequencing errors and imprecise end cleavage. So I think I want to run an alignment of all my reads against every known human mature miRNA sequence allowing for gaps and mismatches and pick the best match for each one (if there is any match good enough). Does anyone know of a fast and efficient way to do this, preferable using R?

R alignment rna-seq mirna • 7.6k views
ADD COMMENTlink modified 5.1 years ago by Mikael Huss4.6k • written 5.1 years ago by Ryan Thompson3.4k
3
gravatar for Ashutosh Pandey
5.1 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I don't know how to do it in R. But here is how I do it:

1) Select reads that range between 18-30 bp and aligned them against reference genome and only keep those reads that originated from the reference genome or got aligned. You can be little easy here with the mismatches. I will go with 2 mismatches.

2) Filtered reads that got aligned to database of small RNA other than miRNA. Either you can use a gtf file for the location of small RNAs or create a database of small rna and remove reads that mapped to them.

The first two steps can be combined or the first step can be totally skipped. I do it as a part of QC.

3) Align the filtered reads against the miRBase using SHRiMP2 under the miRNA mode using default settings. I used precursor microRNA database. I am not sure why I did this but somewhere I read that is advisable to align short reads against precursor miRNA rather than mature miRNA.

4) Then you can use some RPKM generator tool to get counts of various miRNAs from sam or bam file. I use my own script as sam files for miRNA samples are small. In case, a read gets aligned against two miRNAs , then i choose the alignment with the least mismatches.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Ashutosh Pandey11k

I wonder about aligning the reads to precursor microRNA. How will u get to know whether the mirna is 3p or 5p?

ADD REPLYlink written 2.5 years ago by ancient_learner610
3
gravatar for Mikael Huss
5.1 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

You could try to map against the mature.fa sequence file from miRBase. It is easy to find out which of those sequences are human (the hsa-XXX sequences). Note that you may need to substitute Ts for Us in the miRBase sequences. If you want to map your reads using R, you could use the subread aligner via the Rsubread package.

ADD COMMENTlink written 5.1 years ago by Mikael Huss4.6k

Yes, but I'd like to know what alignment tool to use. Can subread or any other short read aligner handle the case where the reference is shorter than the read?

ADD REPLYlink written 5.1 years ago by Ryan Thompson3.4k

Good question. I have used Bowtie and Bowtie2 for mapping to miRBase and they work (it is not surprising that Bowtie2 works because it includes a local mapping option, but plain Bowtie is also fine). I haven't tried Subread for miRNA mapping. It does have something called "local read alignment" (p. 18 in http://www.bioconductor.org/packages/2.12/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf), but I am not sure if it's equivalent to Bowtie2's local mapping.

ADD REPLYlink written 5.1 years ago by Mikael Huss4.6k

Could you please tell me which parameter did you use for bowtie ? thanks

ADD REPLYlink written 5.0 years ago by Nicolas Rosewick6.5k

Sorry for the late reply ... I played around with different parameter settings and it seems I ultimately used "-l 20 -n 0 -v 2 -a --best --strata" (disregarding 'irrelevant' parameters about threads, output format, quality scale etc)

ADD REPLYlink written 5.0 years ago by Mikael Huss4.6k

What kind of Alignment rate are you guys getting?

% uniquely mapped

% multimapped

% no mapped

I know that I will vary quite a lot but just wondering, not a lot is posted out there...

I'm getting ~30, 30, 30 for human.

ADD REPLYlink written 3.5 years ago by RemiToAmigo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 478 users visited in the last hour