Counting reads over genes with summarizeOverlaps
0
0
Entering edit mode
3.8 years ago
concetta ▴ 10

Hi!

I am performing the reads counting using the function summarizeOverlaps (GenomicAlignments_1.20.1 and Rsamtools_2.0.3). My RNA-seq libraries are stranded and paired-end. To count the number of reads assigned to the genes I used the following command:

se <- summarizeOverlaps(features=exonsByGene, reads=bamfiles, mode="Union", singleEnd=FALSE, fragments=TRUE, ignore.strand=FALSE, preprocess.reads=invertStrand)

I also performed the same analysis with HTSeq-count with the option -s reverse and mode unique.

When I checked the counts of some genes I found that the number of counts are different between htseq-count and summarizeOverlaps. For example for two overlapping genes but that are on different strand I obtained the following values:

Gene_name    Strand    SummarizeOverlaps       HTseq-count
Gene_1            -          3929                 80
Gene_2           +          575                   3583

After that I performed again summarizeOverlaps with fragments=FALSE, and I obtained different counts but now the number of counts are similar to HTSeq-count.

 se <- summarizeOverlaps(features=exonsByGene, reads=bamfiles, mode="Union", singleEnd=FALSE, fragments=FALSE, ignore.strand=FALSE, preprocess.reads=invertStrand)
    Warning message:
    In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode,  :
        28471 alignments with ambiguous pairing were dumped.
        Use 'getDumpedAlignments()' to retrieve them from the dump environment.

   #######counts obtained

    Gene_name    Strand    SummarizeOverlaps
        Gene_1            -          70 
        Gene_2           +          3231

Subsequently, I visualize the data on IGV and I observed few reads were mapped on the genes on reverse strand (gene 1) and many reads on the genes on forward strand (+). So for each gene, I extracted the reads mapped on gene 1 and gene 2 selecting reads with XS=- and XS=+ respectively. The number of reads that I obtained for each gene is in agreement with the number of counts obtained with HTSeq-count and summarizeOverlaps with fragment=FALSE: many reads on gene 2 and few reads on gene 1.

I was wondering if the function does not take into account the strand of the reads when I set fragment=TRUE.

In addition, do you suggest to use htseq-count or summarizeOverlaps?

Thanks!

Concetta

RNA-Seq summarizeOverlaps HTSeq-count • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6