Hi Everyone,
I analyzed 200 datasets from NCBI using the same code. However, when I applied the analysis to my original dataset, I noticed that the outputs were incorrect. After discussing with my supervisor, he conducted the same analysis using his own code, and it produced the correct results. I thought maybe my code was wrong, so I tried using his code, but the results were still incorrect. I'm unsure of the differences between the two sets of code. The problem seems to be that my FPKM values are too low. While my HISAT2 alignment rate is approximately 90%, the featureCounts alignment rate is only around 15-20%. This discrepancy is causing issues with the accuracy of my results. Any ideas?
This is my supervisor's code:
counts <- featureCounts(files = bams,
annot.ext = "~/Desktop/mouse_meta/gencode.vM32.primary_assembly.annotation.gtf.gz",
isGTFAnnotationFile=TRUE,
GTF.featureType= c("exon"),
GTF.attrType="gene_id",
useMetaFeatures=TRUE,
allowMultiOverlap=F,
largestOverlap=F,
countMultiMappingReads=TRUE,
isPairedEnd=FALSE,
nthreads=10
)
This is my code:
counts <- featureCounts(files = bams, annot.ext = "/archive/alotaibih/sehribanb/new_mouse/gencode.vM32.primary_assembly.annotation.gtf.gz",
isGTFAnnotationFile= TRUE,
GTF.featureType= c("exon"),
GTF.attrType= "gene_id",
useMetaFeatures= T,
countMultiMappingReads= T,
isPairedEnd= TRUE,
nthreads= 24
)
You mean assignment rate? It is that low even after allowing for counting of multi-mapping reads?
Yeah, let me show you my featureCounts result, it looks like this. It's too low.
Make sure you are using the same version of featureCounts, it has changed how it deals with paired end reads. The exact same command will produce half as many counts when moving from one version of FeatureCounts to another.
the two codes you have have also different flags for paired end.
Sure, I'll inquire about the version of Rsubread. I'll keep you posted if it works :) Thanks
just to clarify, it is not that the program algorithm works differently but the meaning of the flags changed;
before -p was sufficient to have it work in paired mode, but in later versions, an additional parameter is needed to count in paired-end mode.
so the symptom is that the same command may count exactly half as many reads between two versions, and the latter needs the countPairs flag to recover the same behavior