Question

Distribution of mapped reads

0

Entering edit mode

5.9 years ago

ilovesuperheroes1993 ▴ 40

Hi, I have the bam files of small RNA sequencing data mapped to the human reference genome by STAR. I have to find out the percentage of reads that mapped to:

(1) miRNA (2) lncRNA (3) piRNA (4) other non-coding rna (5) introns (6) 3- and 5- utrs (7) promoters

I started by finding out the reads which mapped to known mature miRNA. The command I have used is bedtools intersect -abam bam_file -b mature_mirna_gff file -bed | wc -l

Then I am using the annotation file of lncRNA to find the number of reads, then piRNA and so on.

Is this methodology correct? Do I need to remove the reads which mapped to a specific class from the bam file after each step?

RNA-Seq small RNAseq alignment bedtools miRNA • 1.7k views

ADD COMMENT • link updated 5.9 years ago by Fluorine ▴ 110 • written 5.9 years ago by ilovesuperheroes1993 ▴ 40

0

Entering edit mode

Just to be sure : is it small RNA-Seq (libraies were buid with a specific kit to catch small RNAs such as miRNAs) or RNA-Seq (polyA or rRNA depleted lib) ?

ADD REPLY • link 5.9 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Yes it was done with the NEBNext small RNA kit, specifically for miRNA and piRNA

ADD REPLY • link 5.9 years ago by ilovesuperheroes1993 ▴ 40

0

Entering edit mode

I don't understand why you want to annotate lncRNAs in smallRNA-seq data? You won't find any or rather you shouldn't if the library preparation was done properly. You also wrote in the comment that you enriched for miRNAs and piRNAs. Neither of these have introns.

ADD REPLY • link 5.9 years ago by Fluorine ▴ 110

score 1 · Answer 1 · 2019-08-22

Firstly, for small RNAs you should use Bowtie2 or similar, for mapping reads, because small RNAs don't have introns, thus a spliced aligner is not necessary, in fact it performs worse in my experience for such data. After the alignment of reads to the reference genome, you should count number of reads per gene, for example with featureCount from the SubRead package. RNA central database has a very extensive annotation of small RNAs and you can download an annotation file in GTF of GFF format, with which you annotate your data.