Question

featureCounts Output "The reads are assigned on the single-end mode"

0

Entering edit mode

13 months ago

ExtentHonest56 ▴ 10

Hello,

I am having some trouble interpreting the output of featureCounts. I am interested in completing differential gene expression analysis.

This is my input code:

results=/blue/RNAseq2022/results
DATA=/blue/RNAseq2022/results/HISAT2_Alignment_Index1
GTF=/blue/RNAseq2022/results/GTF
OUTPUT=/blue/RNAseq2022/results/FeatureCounts

cd ${results}

ml subread

prefix=$(basename ${1} ".bam")
featureCounts -p -s 0 -a ${GTF}/Bos_taurus.ARS-UCD1.2.109.gtf.gz -o ${OUTPUT}/${prefix}Counts.txt ${DATA}/${prefix}.bam

All samples were run in parallel (additional run file not included) and submitted as SLURM job.

Output for one of the samples:

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
      v2.0.3

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           210177-1.bam                                     ||
||                                                                            ||
||             Output file : 210177-1Counts.txt                               ||
||                 Summary : 210177-1Counts.txt.summary                       ||
||              Paired-end : yes                                              ||
||        Count read pairs : no                                               ||
||              Annotation : Bos_taurus.ARS-UCD1.2.109.gtf.gz (GTF)           ||
||      Dir for temp files : /blue/RNAseq2022/ ...                            ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Bos_taurus.ARS-UCD1.2.109.gtf.gz ...                  ||
||    Features : 433820                                                       ||
||    Meta-features : 27607                                                   ||
||    Chromosomes/contigs : 178                                               ||
||                                                                            ||
|| Process BAM file 210177-1.bam...                                           ||
||    Paired-end reads are included.                                          ||
||    The reads are assigned on the single-end mode.                          ||
||    Total alignments : 43802718                                             ||
||    Successfully assigned alignments : 15799023 (36.1%)                     ||
||    Running time : 1.30 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/blue/RNAseq2022/...     ||
|| RNAseq2022/results/FeatureCounts/210177-1Counts.txt.summary"               ||
||                                                                            ||
\\============================================================================//

What I am most confused on is why it states "The reads are assigned on the single-end mode." when my data is paired-end, were aligned as paired, and I specified paired in the arguments for featureCounts.

Also, my overall alignment using Hisat2 was ~94% across all samples. This includes multi-aligned reads etc., but I believe mapped 1 was ~80% or so. I know here I am only counting alignment to meta-features (genes) so I am guessing this is why the "Successfully assigned alignments" drops drastically, but is a ~30-40% average typical? Should I be including multi-mapped and multi-overlapping reads? This is my first time doing bulk RNA-seq analysis.

Thank you in advance to anyone who can provide guidance with this!

featureCounts alignment counts RNA-seq • 4.2k views

ADD COMMENT • link updated 5 months ago by GenoMax 141k • written 13 months ago by ExtentHonest56 ▴ 10

0

Entering edit mode

Hi there, I also encounter with the same problem recently. I am so confused what is 'Successfully assigned alignments', and what's the different between this alignment result with the one I run out in hisat2.

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
      v2.0.6

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           output.sorted.bam                                ||
||                                                                            ||
||             Output file : gene_counts.txt                                  ||
||                 Summary : gene_counts.txt.summary                          ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : hg38.refGene.gtf (GTF)                           ||
||      Dir for temp files : ./                                               ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file hg38.refGene.gtf ...                                  ||
||    Features : 836541                                                       ||
||    Meta-features : 28278                                                   ||
||    Chromosomes/contigs : 367                                               ||
||                                                                            ||
|| Process BAM file output.sorted.bam...                                      ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 11744671                                             ||
||    Successfully assigned alignments : 8602892 (73.2%)                      ||
||    Running time : 0.58 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "gene_counts.txt.summary  ||
|| "                                                                          ||
||                                                                            ||
\\============================================================================//

I have already add --countReadPairs in my command. but still get only 73.2% alignments while my hisat2 alignments is up to 99.88% as below:

[sysu23215258@master hisat2-2.2.1]$ cat nohup.out
/usr/bin/env: python3: Permission denied
10697767 reads; of these:
  10697767 (100.00%) were paired; of these:
    66276 (0.62%) aligned concordantly 0 times
    10088680 (94.31%) aligned concordantly exactly 1 time
    542811 (5.07%) aligned concordantly >1 times
    ----
    66276 pairs aligned concordantly 0 times; of these:
      29532 (44.56%) aligned discordantly 1 time
    ----
    36744 pairs aligned 0 times concordantly or discordantly; of these:
      73488 mates make up the pairs; of these:
        24637 (33.53%) aligned 0 times
        42488 (57.82%) aligned exactly 1 time
        6363 (8.66%) aligned >1 times
99.88% overall alignment rate

I cant tell their difference. heeeeeeeeelp plz.

ADD REPLY • link updated 5 months ago by Ram 43k • written 5 months ago by DANDI • 0

0

Entering edit mode

heeeeeeeeelp plz.

This is a professional forum. Please don't use childlike language here.

Also, don't use nohup, it's an old hack that is no longer the way to run processes that should not be interrupted if user accidentally gets disconnected from the process. Use screen or tmux instead.

ADD REPLY • link 5 months ago by Ram 43k

0

Entering edit mode

but still get only 73.2% alignments while my hisat2 alignments is up to 99.88% as below:

featureCounts is assigning aligned read to create counts. It omits multi-mapping reads since they do not provide specific information. So while hisat2 is aligning the data at a higher percentage only part of those alignments can be used for generating counts. featureCounts is not doing alignment, it is only figuring out which reads fall under which gene models to generate the counts.

ADD REPLY • link 5 months ago by GenoMax 141k

score 3 · Accepted Answer · 2023-03-17

3

Entering edit mode

13 months ago

GenoMax 141k

why it states "The reads are assigned on the single-end mode." when my data is paired-end, were aligned as paired, and I specified paired in the arguments for featureCounts.

Use the two options below together

  -p                  Specify that input data contain paired-end reads. To
                      perform fragment counting (ie. counting read pairs), the
                      '--countReadPairs' parameter should also be specified in
                      addition to this parameter.

  --countReadPairs    Count read pairs (fragments) instead of reads. This option
                      is only applicable for paired-end reads.

You may also want to count at exon level and then summarize by the gene.

 -t <string>         Specify feature type(s) in a GTF annotation. If multiple
                      types are provided, they should be separated by ',' with
                      no space in between. 'exon' by default. Rows in the
                      annotation with a matched feature will be extracted and
                      used for read mapping. 

  -g <string>         Specify attribute type in GTF annotation. 'gene_id' by 
                      default. Meta-features used for read counting will be 
                      extracted from annotation using the provided value.

ADD COMMENT • link 13 months ago by GenoMax 141k

0

Entering edit mode

Thank you, I will give this a try. I saw this when reading through the manual, but I was confused by the wording. What exactly do they mean by "Count read pairs (fragments) instead of reads". I know this may be self-explanatory to most, but I just want to make sure I am understanding properly.

ADD REPLY • link 13 months ago by ExtentHonest56 ▴ 10

0

Entering edit mode

I assumed that by specifying -p for paired it would use both reads for counting purposes, so I don't understand why these two options are separate.

ADD REPLY • link 13 months ago by ExtentHonest56 ▴ 10

2

Entering edit mode

it is a pretty big mistake by the developers, -p it used to mean one thing, then with version 2.0.2 they changed what it does

in the past -p would do what --countReadPairs does now, in the new version it is not clear what effect the -p parameter alone has

Release 2.0.2, 29 March 2021

New parameter --countReadPairs is added to featureCounts to explicitly specify that read pairs will be counted, and the -p option in featureCounts now only specifies if the input reads are paired end (it also implied that counting of read pairs would be performed in previous versions).

ADD REPLY • link 13 months ago by Istvan Albert 100k

1

Entering edit mode

Two reads in paired-end sequencing come from one fragment so an explicit --countReadPairs option was added in recent versions of featureCounts. Otherwise reads may be double counted.

ADD REPLY • link 13 months ago by GenoMax 141k