Question

featureCounts with exon_id has counts backwards for reversely stranded genes

0

Entering edit mode

21 months ago

ksh4718 ▴ 10

Hello. I am trying to count at the exon level using Subread's (v1.6.3) featureCounts for my RNA-seq data (reversely stranded library prep). I noticed that for reversely stranded genes, the exon counts are backwards (i.e. for a 22 exon gene, exon 1 is actually exon 22, exon 2 is exon 21, and so on). Is there a way to fix this?

Here is my code [from part of a loop so $i is my file and $gtf is the path to the gtf file with exon_id:

featureCounts   -p \
                -F GTF \
                -s 2 \
                -a $gtf \
                -O \
                -t exon -g exon_id \
                -o counts/$i'_counts.txt' \
                ../aligned/$i'_mapped.sam'

Attached is a photo of IGV for an example gene and the output counts for this gene. You can see that I expect 0 counts for exons 2-4 and many counts for exon 22 but the data shows it backwards.

IGV screenshot:

Counts for the transcript in IGV screenshot:

Any guidance is appreciated. I want to ultimately do differential expression at the exon level so summarizing by gene is not an option.

subread exons rna-seq featureCounts • 1.0k views

ADD COMMENT • link 21 months ago by ksh4718 ▴ 10

0

Entering edit mode

Wouldn't you be comparing exon expression between samples but still in the same gene? That should take care of possible problems with the exon numbering convention. Exons are functional at mRNA level, which is why the exon with the most 3' genomic position gets named the first exon, as it's the most 5' in the pre-mRNA. As long as feature IDs are unique and consistent across samples, differential expression should not be affected by the name any feature is given.

ADD REPLY • link 21 months ago by Ram 43k

0

Entering edit mode

@Ram Yes that's true. I thought it would be more clear to present 'exon 1' as the first exon in the mRNA (closest to the start codon)

ADD REPLY • link 21 months ago by ksh4718 ▴ 10

0

Entering edit mode

Exon 1 is still closest to the start codon no matter what the orientation is. The whole purpose of "reverse numbering" exons is to name the exon closest to start codon as exon-1. Am I misunderstanding something?

ADD REPLY • link 21 months ago by Ram 43k

0

Entering edit mode

In the table that I uploaded, ENSMUST00000170872.2.22 is actually exon 1 and ENSMUST00000170872.2.1 is actually Exon 22. This is my issue.

ADD REPLY • link 21 months ago by ksh4718 ▴ 10

1

Entering edit mode

Ah I see. This is going to be a little tricky. You should start by getting a list of genes on the minus strand, then use R to create lookup tables that contain number of exons per transcript. With that, you can do something like corrected_exon_number = abs(current_exon_number - total_exons) + 1 to generate new IDs for the exons.