mapping rate of small rna-seq with different references
Entering edit mode
5.2 years ago
woongjaej ▴ 20

Hi, folks

I'm new to analysing small RNa-seq and I have some questions. Hope some experts analysing small RNA-seq could give me some advices.

  1. I'm mapping my single-end smRNA-seq data to hg19, hg38 references. I used Cap-mirseq pipeline to do this so the aligner was bowtie. When I got bam files, I check mapped reads with samtools flagstat and was surprised. My bam mapped to hg19 reference got about 50,000,000 reads and bam mapped to hg38 reference got about half of hg19 mapped bam. I had 3 more other data and tried all of them. Here's what I got for the flagstat result.


58275643 + 0 mapped (90.82% : N/A)
25226589 + 0 mapped (87.94% : N/A)
36270257 + 0 mapped (86.49% : N/A)
27601897 + 0 mapped (91.43% : N/A)


23224974 + 0 mapped (53.08% : N/A)
18395834 + 0 mapped (74.52% : N/A)
20368027 + 0 mapped (62.12% : N/A)
17979959 + 0 mapped (73.06% : N/A)

Could this be possible??

  1. I'm going to analysis DEG with these data. I'm confused how to get raw count file with smRNA-seq data. This is different with just RNA-seq, right? Could someone give me some pointer how to do this? Should I use just normal gtf file or miRNA data base's gtf file?(such as hairpin.gft?)

(ex. using htseq-count with which gtf file or gff file, the feature type I should use, id attribute to use,etc)

Thank you very much for your helps!

smRNA-seq mapping rate reference • 1.9k views
Entering edit mode
5.2 years ago
SJ Basu ▴ 50

Firstly I assume you are analysis is primarily miRNA detection.

So when you map smRNA reads, they being very small sequence tends to map everywhere along the genome. Now for the anomaly in mapping %age, it can happen due to a lot of reasons like masking, incorrect chromosomal file concatenation, difference in mapping parameters etc. So I suggest two things first filter the reads of Rfam or transcriptome sequences (use --norc option for transcriptome), then map them to latest human genome. Then you shall get all putative miRNA reads to considered for miRNA detection.

Now for differential miRNA expression, its actually much simpler than RNA-seq. This can be readily achieved through miRDeep2 pipeline, then you just have to convert the raw counts to CPM counts and do fold change (or may be put it through DESeq2). But if you want to do manually 1. collapse the reads (collapsed values are your read counts). 2. map the collapsed read file to human mature miRNA from miRBase (get mappings in bowtie default option and not bam) and 3. from the bowtie_out file append the collapsed read count value to the mapped reference mat-miRNA (in case of multiple reads mapping, add the values). Then use CPM or DESeq2 method for Differential expression.

Entering edit mode

I forgot to thank you 4 years ago! Thanks a lot!


Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6