featureCounts : GFF3 gene identifier and low percentage of assigned alignments
0
0
Entering edit mode
2.4 years ago
liyong ▴ 80

Hi,

I am very new to RNAseq analysis. I got Illumina paired-end RNA-Seq data. After QC, the data was aligned to the genome with a gff3 annotation file using STAR (Uniquely mapped reads 80%), then I use featureCounts (version 2.0.1) in conda env to count genes.

The parameters for running featureCounts are listed following:

featureCounts \
analysis/aligned_sequences/SRR1171897/Aligned.sortedByCoord.out.bam \
-a data/annotation/Cs_genes_v2_annot.gff3 \
-o analysis/final_counts/SRR1171897/featureCounts.txt \
-T 10 \
-p \
-F "GFF3" \
-g "Parent"

I am not sure about the -g "Parent" based on the gff3 annotation file, the first couple lines were showed below:

Chr1    AAFC_NRC    gene    1   6504    .   -   .   ID=Csa01g001000;Name=Csa01g001000;Note=methyl-CPG-binding domain 9

Chr1    AAFC_NRC    gene    1   6504    .   -   .   ID=Csa01g001000;Name=Csa01g001000

Chr1    AAFC_NRC    mRNA    1   6504    .   -   .   ID=Csa01g001000.1;Name=Csa01g001000.1;Parent=Csa01g001000;Note=methyl-CPG-binding domain 9

Chr1    AAFC_NRC    five_prime_UTR  6380    6504    .   -   .   ID=Csa01g001000.1.utr5p1;Parent=Csa01g001000.1

Chr1    AAFC_NRC    exon    5865    6504    .   -   .   ID=Csa01g001000.1.exon1;Parent=Csa01g001000.1

After finishing the featureCounts, I got the following results:enter image description here

My question is which gene identifier should I use for the -g parameter when running featureCounts, and why I only got 51.1% successfully assigned alignments? Is my result correct and is there anything I could do to improve this?

Thank you very much.

identifier GFF3 featureCounts RNAseq gene • 1.4k views
ADD COMMENT
0
Entering edit mode

In my experience, setting also the arguments -M (including multi-mapping reads) slightly increased the % of assigned alignments. You can also try to set -t gene (-t exon by default) and see if you notice any changes. However, getting a % of assigned alignments ~50-60% does not necessarily mean that the annotation has been unsuccessful, but it might be that many of your reads come from regions not annotated (like some non-coding regions).

I would also suggest you try different .gtf files, as it might also affect the results. Not sure from what source you got yours, but I usually stick to the ones from Ensembl.

ADD REPLY
0
Entering edit mode

Hello Marco,

Thank you for the suggestion. Sounds good, I will play around with the -M and -t parameter.

The .gff3 file come from our own lab, I will ask around to see if there is a different version.

Thanks, Liyong

ADD REPLY

Login before adding your answer.

Traffic: 2972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6