I am very new to RNAseq analysis. I got Illumina paired-end RNA-Seq data. After QC, the data was aligned to the genome with a gff3 annotation file using STAR (Uniquely mapped reads 80%), then I use featureCounts (version 2.0.1) in conda env to count genes.
The parameters for running featureCounts are listed following:
featureCounts \ analysis/aligned_sequences/SRR1171897/Aligned.sortedByCoord.out.bam \ -a data/annotation/Cs_genes_v2_annot.gff3 \ -o analysis/final_counts/SRR1171897/featureCounts.txt \ -T 10 \ -p \ -F "GFF3" \ -g "Parent"
I am not sure about the -g "Parent" based on the gff3 annotation file, the first couple lines were showed below:
Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000;Note=methyl-CPG-binding domain 9 Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000 Chr1 AAFC_NRC mRNA 1 6504 . - . ID=Csa01g001000.1;Name=Csa01g001000.1;Parent=Csa01g001000;Note=methyl-CPG-binding domain 9 Chr1 AAFC_NRC five_prime_UTR 6380 6504 . - . ID=Csa01g001000.1.utr5p1;Parent=Csa01g001000.1 Chr1 AAFC_NRC exon 5865 6504 . - . ID=Csa01g001000.1.exon1;Parent=Csa01g001000.1
After finishing the featureCounts, I got the following results:
My question is which gene identifier should I use for the -g parameter when running featureCounts, and why I only got 51.1% successfully assigned alignments? Is my result correct and is there anything I could do to improve this?
Thank you very much.