featureCounts low annotation rate RNA-seq
1
0
Entering edit mode
11 weeks ago
Marco Pannone ▴ 110

Hey everybody!

I am trying to annotate my RNA-seq files (paired-end) with featureCounts. However, I keep having a quite low annotation rate, ~35-36% for all the files.

My command line is the following:

featureCounts -T 6 -p -s 2 -a annotation_file.gtf -o output_file.txt input_files.bam

I am using the parameter -s 2 since my paired-end files seem to be reversely stranded (I have assessed that by running infer_experiment.py from the RSeQC package.)

Alignment has been performed with hisat2 and .bam files sorted by coordinates before performing annotation. I have checked rRNA contamination in .bam files using split_bam.py from RSeQC package and the .bed file for hg38 rRNAs provided by the package. None of my files show rRNA contamination.

In order to try solving the problem, I first checked that the chromosome names were consistent between my .bam files and the provided .gtf file. They are the same.

Then, I tried using .gtf files from different sources (Gencode, Ensembl, UCSC,..) but the results did not change.

Lastly, I tried running featureCounts using also the -O parameter, which includes multi-overlapping reads, just to check if most of the reads were not annotated due to multi-overlapping reasons. The results did not change even in this case.

I would really appreciate any suggestion for trying to figure out what causes poor annotation of my files. I have checked as much as I could, accordingly to my experience with RNA-seq data, but I could not find a solution.

Thanks in advance to everybody who will try helping me! :)

annotation rna-seq featureCounts • 504 views
ADD COMMENT
0
Entering edit mode

Update

I tried setting up -t gene rather than the default -t exon and it increased the annotation rate above 50% (~52-53%). Still not satisfying, but for sure it made a good improvement.

ADD REPLY
0
Entering edit mode
11 weeks ago
MatthewP ★ 1.1k

You can check hisat2 alignment summary to see overall alignment rate(also low?) and percentage of multiple alignment. Because featureCounts use unique alignment by default, you can change this behavior with -M parameter.
Hisat2 report like:

HISAT2 summary stats:
        Total pairs: 22702458
                Aligned concordantly or discordantly 0 time: 1515030 (6.67%)
                Aligned concordantly 1 time: 20209983 (89.02%)
                Aligned concordantly >1 times: 739388 (3.26%)
                Aligned discordantly 1 time: 238057 (1.05%)
        Total unpaired reads: 3030060
                Aligned 0 time: 1722846 (56.86%)
                Aligned 1 time: 1226529 (40.48%)
                Aligned >1 times: 80685 (2.66%)
        Overall alignment rate: 96.21%
ADD COMMENT
0
Entering edit mode

The overall alignment rate with hisat2 is very high for all the files (~96-97%). Reads reporting multiple alignments are ~4-5% for paired reads and ~10% for unpaired reads.

Also, I have tried running featureCounts with the -M parameter specified, which increases the annotation rate even a bit more (~60%) for some samples. However, I think 60% is still not that great..

ADD REPLY
0
Entering edit mode

Aligned concordantly >1 times: 739388 (3.26%)

ADD REPLY
0
Entering edit mode

Yes, as I mentioned in the previous comment they account for ~5-6%.

ADD REPLY
0
Entering edit mode

Try add --countReadPairs parameter? So featureCounts will count fragments instead of reads in paired-end mode.

ADD REPLY
0
Entering edit mode

In the latest release, this is also performed by default when specified -p using paired-end data.

Indeed, in the documentation, for -p it reports: If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads; single-end reads are always counted as reads.

ADD REPLY

Login before adding your answer.

Traffic: 1783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6