Hello,
I am having some trouble interpreting the output of featureCounts. I am interested in completing differential gene expression analysis.
This is my input code:
results=/blue/RNAseq2022/results
DATA=/blue/RNAseq2022/results/HISAT2_Alignment_Index1
GTF=/blue/RNAseq2022/results/GTF
OUTPUT=/blue/RNAseq2022/results/FeatureCounts
cd ${results}
ml subread
prefix=$(basename ${1} ".bam")
featureCounts -p -s 0 -a ${GTF}/Bos_taurus.ARS-UCD1.2.109.gtf.gz -o ${OUTPUT}/${prefix}Counts.txt ${DATA}/${prefix}.bam
All samples were run in parallel (additional run file not included) and submitted as SLURM job.
Output for one of the samples:
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.3
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| 210177-1.bam ||
|| ||
|| Output file : 210177-1Counts.txt ||
|| Summary : 210177-1Counts.txt.summary ||
|| Paired-end : yes ||
|| Count read pairs : no ||
|| Annotation : Bos_taurus.ARS-UCD1.2.109.gtf.gz (GTF) ||
|| Dir for temp files : /blue/RNAseq2022/ ... ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Bos_taurus.ARS-UCD1.2.109.gtf.gz ... ||
|| Features : 433820 ||
|| Meta-features : 27607 ||
|| Chromosomes/contigs : 178 ||
|| ||
|| Process BAM file 210177-1.bam... ||
|| Paired-end reads are included. ||
|| The reads are assigned on the single-end mode. ||
|| Total alignments : 43802718 ||
|| Successfully assigned alignments : 15799023 (36.1%) ||
|| Running time : 1.30 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/blue/RNAseq2022/... ||
|| RNAseq2022/results/FeatureCounts/210177-1Counts.txt.summary" ||
|| ||
\\============================================================================//
What I am most confused on is why it states "The reads are assigned on the single-end mode." when my data is paired-end, were aligned as paired, and I specified paired in the arguments for featureCounts.
Also, my overall alignment using Hisat2 was ~94% across all samples. This includes multi-aligned reads etc., but I believe mapped 1 was ~80% or so. I know here I am only counting alignment to meta-features (genes) so I am guessing this is why the "Successfully assigned alignments" drops drastically, but is a ~30-40% average typical? Should I be including multi-mapped and multi-overlapping reads? This is my first time doing bulk RNA-seq analysis.
Thank you in advance to anyone who can provide guidance with this!
Thank you, I will give this a try. I saw this when reading through the manual, but I was confused by the wording. What exactly do they mean by "Count read pairs (fragments) instead of reads". I know this may be self-explanatory to most, but I just want to make sure I am understanding properly.
I assumed that by specifying -p for paired it would use both reads for counting purposes, so I don't understand why these two options are separate.
Two reads in paired-end sequencing come from one fragment so an explicit
--countReadPairs
option was added in recent versions of featureCounts. Otherwise reads may be double counted.it is a pretty big mistake by the developers,
-p
it used to mean one thing, then with version 2.0.2 they changed what it doesin the past
-p
would do what--countReadPairs
does now, in the new version it is not clear what effect the-p
parameter alone has