Very low successfully assigned alignments with feature counts
0
0
Entering edit mode
12 weeks ago
Manuel • 0

Hello everyone,

I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate.

However, when I use featureCounts, I get:

  • 5.7% when I set the strandSpecific parameter to 1.
  • 5.3% when I set the strandSpecific parameter to 2 (I guess this would be the correct one)
  • 18.8% when I set the strandSpecific parameter to 0

Since this is the first time I'm analyzing sequencing data, I was wondering if someone could guide me on what might be happening or if this is normal.

Thank you very much in advance.


I include more information about the kits and commands used:

The kits that have been used:

  • For library preparation: Illumina┬« Stranded Total RNA Prep, Ligation with Ribo-Zero Plus (16 Samples) (REF: 20040525)
  • For sequencing: NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (REF: 20028401)

In our case, the sequencing was single (R1).

For the alignment, I use HISAT2 with the following command:

hisat2 -q \
  --rna-strandness F \
  -x /home/manu/research/genome/grch38/genome \
  -U /home/manu/research/analisis/data/ITK2-004-S1.fastq.gz | \
    samtools sort -o /home/manu/research/analisis/align/demo_trimmed.bam

99524981 reads; of these:
  99524981 (100.00%) were unpaired; of these:
    5030213 (5.05%) aligned 0 times
    58319656 (58.60%) aligned exactly 1 time
    36175112 (36.35%) aligned >1 times
94.95% overall alignment rate

Later I use featureCounts to count the reads:

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 1,
                        isGTFAnnotationFile = TRUE,
                        GTF.featureType = "exon",
                        GTF.attrType = "gene_id")

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Strand specific : stranded                                              ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 12844679 (5.7%)                      ||
||    Running time : 3.32 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
\\============================================================================//



\\============================================================================//

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 2,
                        isGTFAnnotationFile = TRUE)

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Strand specific : reversely stranded                                    ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 34526662 (15.3%)                     ||
||    Running time : 3.47 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
\\============================================================================//

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 0,
                        isGTFAnnotationFile = TRUE)

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 42545035 (18.8%)                     ||
||    Running time : 3.36 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||

[\\============================================================================//]\\============================================================================//
featureCounts RNA-seq • 356 views
ADD COMMENT
0
Entering edit mode

Is the genome/annotation from the same source i.e. Ensembl or so it appears?

ADD REPLY
0
Entering edit mode

Load up your data in IGV and visualize it relative to your annotation file. It will tell you where the data aligns.

5% match on a 99% alignment is not the expected outcome

it could be that the annotations don't match the genome - that would be my first guess; make sure the chromosome names match, too

(that being said IGV will convert the chr1 to 1 but featureCounts won't)

ADD REPLY

Login before adding your answer.

Traffic: 1416 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6