How to calculate fraction of RNA-Seq reads mapped to a customised genomic locations?
1
0
Entering edit mode
7.7 years ago
biorepine ★ 1.5k

Dear Biostars,

I am trying to calculate the fraction of RNA-Seq reads map to a customised dataset of genomic locations. I am using the following steps. Could you please answer whether this is correct or not? Also do you think is there any better or fatser way to do this?

Data: BAM file (Paired-end, unstranded, poly(A)+ RNA-seq) mapped to hg19,

Step1: Obtain total number of unique reads mapped to customised genomic locations

#(samtools view -H file1_mapped.bam;  samtools view file1_mapped.bam | grep NH:i:1) | samtools view -Sb - | bedtools coverage -abam - -b ../customised.bed6 -counts -split > file1.uniquelymapped.counts

Step2:

awk '{ sum+=$7} END {print sum}' file1.uniquelymapped.counts
2639265

Step3: Obtain total number of mapped reads

 samtools flagstat file1_mapped.bam
84428534 + 4669383 in total (QC-passed reads + QC-failed reads)
34553453 + 0 duplicates
80610721 + 1971670 mapped (95.48%:42.23%)
84428534 + 4669383 paired in sequencing
42352168 + 2345534 read1
42076366 + 2323849 read2
77864131 + 1152188 properly paired (92.22%:24.68%)
77864131 + 1152188 with itself and mate mapped
2746590 + 819482 singletons (3.25%:17.55%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Step4: Fraction of RNA-Seq reads mapped =(Uniquely mapped reads/Total number of mapped reads)

=2639265/84428534 = 0.031
RNA-Seq • 2.5k views
ADD COMMENT
0
Entering edit mode
7.7 years ago

Your samtools flagstat will probably be inflated by multimapping reads, I don't think it's desirable as such. You are dividing by the total number of reads but not by the total number of mapped reads (84428534 vs 80610721).

I would suggest using HTSeq count or featureCounts for this, with a specifically adapted gtf file.

ADD COMMENT
0
Entering edit mode

Does this work ?

samtools sort file1_mapped.bam - |samtools view -bf 1 - |samtools view - | python htseq-count --mode=union --stranded=no - gtf.file
ADD REPLY
0
Entering edit mode

I guess not. Think some of your piping is wrong but testing it will tell you immediately.

htseq-count can work on a sam file (by default) as well as on a bam file (specify -f bam). You don't need the -f 1... Depending on how you made your .gtf, you'll have to specify the -t and -i flag of htseq-count

ADD REPLY
0
Entering edit mode

but htseq-count doesn't have -f option?

ADD REPLY
0
Entering edit mode

Yes it does. Perhaps your version is outdated.

ADD REPLY

Login before adding your answer.

Traffic: 1423 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6