Question

Question about featureCounts

3

Entering edit mode

4.5 years ago

chichaochen ▴ 30

Hi: I was trying featureCounts for with these 2 following settings.
1.

featureCounts -t exon -g gene_id (the rest default, not specified)

2.

featureCounts -t gene -g gene_id (the rest default, not specified)

I expected to see more reads (fragments) in condition 2, since the region of "gene" in the annotation gtf covers both "exon" and intron region. However, it's surprising to see that for a lot of the genes i got more counts in condition 1. Could anybody help with this? Thanks in advance!

RNA-Seq featurecounts • 1.9k views

ADD COMMENT • link updated 2.7 years ago by wang-yanfang • 0 • written 4.5 years ago by chichaochen ▴ 30

1

Entering edit mode

I'd guess it depends on overlapping gene loci which can be resolved on exon level.

Have a look at the summary table and compare the ambiguous count.

ADD REPLY • link 4.5 years ago by michael.ante ★ 3.8k

0

Entering edit mode

Just speculating, so not posting it as an answer. Method 1 gets read counts per exon. What does it do if a paired-end read has the forward read aligned to exon 1 and its mate aligned to exon 2, or even a single read aligned across a splice junction? It might add 1 count to exon 1 and 1 count to exon 2. Now if you add up the counts for each exon of that gene, that read pair ends up adding two counts for that gene. If you are just counting over the entire gene region, then that read pair would only add one count. I'm not sure that's actually what featureCounts does, but it's possible.

ADD REPLY • link 4.5 years ago by colin.kern ★ 1.1k

2

Entering edit mode

The first command isn't counting per-exon, it's counting reads/pairs where at least one overlaps a gene's exons. In short, the first command is the standard command for RNA-seq and uses the default values for everything. If one wanted to count per-exon, one would need to change the -g option.

ADD REPLY • link 4.5 years ago by Devon Ryan 104k

0

Entering edit mode

Hey Devon Ryan,

I have only paired end reads (RNAseq) mapping to its genome by bowtie2, and I use featureCounts, too. Could you help me understand my code :

featureCounts -t gene -g "Id" (the rest default, not specified)

Does this code make sure that paired-end reads matching to a gene only count 1? Or 2?

I am not sure should I specify -p to count fragment not reads, and could you help me understand if the following code give the same output as the above one?

featureCounts -p -t gene -g "Id" (the rest default, not specified)

I am wondering since I only have paired reads these two codes should be both OK? right?

Last question: when should I specify strandness? does strandSpecific=2 or 1 give half of count than not specified strandness(0)?

I am very confused about this, and I am not sure if what I understand is right or wrong. I hope to get some confirmation or correction from you.

Thanks a lot. Yanfang

ADD REPLY • link 2.7 years ago by wang-yanfang • 0

0

Entering edit mode

Thanks for the replies. To deal with the overlapping issue as pointed out by you guys, i tried to split the count equally to all overlapping features by :

1.featureCounts -p -O --fraction -t exon -g gene_id
2.featureCounts -p -O --fraction -t gene -g gene_id

I still saw some genes Counts in condition 1 >condition 2. Now i am more confused . @@...

ADD REPLY • link updated 4.5 years ago by GenoMax 141k • written 4.5 years ago by chichaochen ▴ 30

3

Entering edit mode

When in doubt, have a look in IGV (make sure to load the GTF you're using) and maybe have featureCounts output the SAM file with reads annotated according to their feature assignment. Then you can debug why reads are being assigned the way they are.

ADD REPLY • link 4.5 years ago by Devon Ryan 104k