Question: featureCounts exon counting problem
0
gravatar for Nicolas Rosewick
4.1 years ago by
Belgium, Brussels
Nicolas Rosewick8.0k wrote:

Hi,

I try to count the number of reads at the exon level using featurecounts. When I try this on a bam file aligned using STAR:

featureCounts -f -C -s 1 -p -t exon -g gene_id -a ensembl.gtf -o exon_count.txt align.bam

it gives me strange results. For example in IGV I can clearly see read alignment on BRCC3 but 0 read count for every BRCC3 exons in featureCounts output.. Is my command correct ?

 

 

exon featurecounts • 6.9k views
ADD COMMENTlink modified 3.1 years ago by k.rue-albrecht20 • written 4.1 years ago by Nicolas Rosewick8.0k
2

I wonder if the fact that there are duplicate exon entries is what's causing this. Try adding the -R option and redirect stdout to a file to track how alignments are getting treated.

ADD REPLYlink written 4.1 years ago by Devon Ryan91k
2
gravatar for k.rue-albrecht
3.1 years ago by
k.rue-albrecht20 wrote:

Hi,

I know I'm a bit late to answer, but I had the same/similar problem, largely answered by Devon's comment:

I wonder if the fact that there are duplicate exon entries is what's causing this. Try adding the -R option and redirect stdout to a file to track how alignments are getting treated.

did you try using the -O option ?

Assign reads to all their overlapping meta-features (or features if -f is specified).

In my case, this option restored ~71.6% assigned, compared to 27.8% assigned without the option.

Just showing the relevant options:

  • Overlap exons, assign genes: /home/kruealbr/bin/featureCounts -s 1 -T 12 -J --primary --ignoreDup -p
    • 69.7% fragments assigned
  • Overlap exons, assign exons : /home/kruealbr/bin/featureCounts -f -s 1 -T 12 -J --primary --ignoreDup -p
    • 27.8%
  • Overlap exons, assign all exons: /home/kruealbr/bin/featureCounts -f -O -s 1 -T 12 -J -G /work/kruealbr/genomes/grch37/84/fasta/Homo_sapiens.GRCh37.dna.primary_assembly.fa --primary --ignoreDup -p
    • 71.6%

Of course, given the description, one has to be careful how the count data is then processed, as each read may be counted multiple times.

Best wishes, Kevin

ADD COMMENTlink written 3.1 years ago by k.rue-albrecht20
0
gravatar for Kirill
3.7 years ago by
Kirill260
Australia
Kirill260 wrote:

Hey NicoBxl, 

I'm also doing per exon counts using featureCounts. This is the command I ran

featureCounts -T 14  -a my_gtf_file.gft  -o my_output.txt -p -s 2 -g exon_id  my_bam_file.bam 

featureCounts v1.4.6-p2

And it worked for me.. (at least I think it did) I then manually collapsed all exons into genes and compared the counts with featureCounts per gene (default) and (looking by eye) gene counts from collapsed exons counts had slightly fewer reads  compare to default featureCounts counts, but we reasoned that if the read is ambiguous (for example spans exon-intron junction) featureCounts discards it, therefore less reads when counting per exon. 

From my understanding:

-t option is for counting against your particular feature of interest e.g three_prime_utr. That option really sets the coordinates rage. "Is your read in that coordinates range..?" Most often we are interested in "Is my read in the exon coordinates range..?" hence the default

-g option is the crucial one, it sets your bins. "How do you want to bin your reads..?" In exons please.., but you are saying bin them into genes.

You can grab any line from GTF file (I'm only going to speak for GTF file) and in every line you'll have gene_id attribute, in most you'll have transcript_id and in some you'll have exon_id. Obviously exons go into transcript and transcripts go into gene and so you can only group reads into the same level or higher, but not lower. In other words if you specify `-t transcript` you can only group by `-g transcript_id` or `-g gene_id` (default) but you cannot group by `-g exon_id`. On the other hand if you specify `-t exon` (default) you can group by `-g exon_id` or `-g transcript_id` or `-g gene_id`.  And with three_prime_utr they only have transcript_id and gene_id attributes (make sense, because utr belongs to the transcript). 

-f option (I think, haven't tested it out yet) is for some internal featureCounts use. The meta-feature term is something featureCounts people came up themselves. From the docs We define a meta-feature to be a set of features representing a biological construct of interest. For example, features often correspond to exons and meta-features to genes.​ 

I didn't use it when I was counting reads per exon, but I'm pretty sure now that I should have. In my case I left `-f` off so featureCounts must have counted reads per meta-feature i.e gene and then some how broken them into exon_id groups.. I guess if I'd have specified `-f` if would have counted right from the star.. Although I'm a little confused about `-f` option, because given that you want to group by exons it wouldn't make sense to summarise against genes.. would it..? I'd like to more about -f option myself.

Cheers, 

ADD COMMENTlink written 3.7 years ago by Kirill260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1593 users visited in the last hour