Question

Low fragment assignment RNAseq

0

Entering edit mode

5.9 years ago

said3427 ▴ 120

Hello, I have RNAseq data (2x76 bp) from Mouse, the objective of the study is to run a standard differential analysis.

After adapter and quality trimming using Trim_galore, I aligned the reads using HISAT2 v2.1.0

hisat2index="/mnt/b/references/grcm38_snp_tran/genome_snp_tran"
hisat2 -q -p 36 -x $hisat2index -1 "S1.R1.fastq.gz" -2 "S1.R2.fastq.gz" | samtools view -bS - > "S1.bam"

Summary:

40297213 reads; of these:
  40297213 (100.00%) were paired; of these:
    21041306 (52.22%) aligned concordantly 0 times
    16592068 (41.17%) aligned concordantly exactly 1 time
    2663839 (6.61%) aligned concordantly >1 times
    ----
    21041306 pairs aligned concordantly 0 times; of these:
      703079 (3.34%) aligned discordantly 1 time
    ----
    20338227 pairs aligned 0 times concordantly or discordantly; of these:
      40676454 mates make up the pairs; of these:
        27209377 (66.89%) aligned 0 times
        9337726 (22.96%) aligned exactly 1 time
        4129351 (10.15%) aligned >1 times
66.24% overall alignment rate

The high percentage of paired reads not concordantly aligned really surprised me, any idea?

After running RSeQC v2.6.4- infer_experiment.py - I assumed that the library is reversely stranded:

This is PairEnd Data
Fraction of reads failed to determine: 0.0180
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0204
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9616

I re-run the alignment using:

hisat2 -q -p 36 --rf -x $hisat2index -1 "S1.R1.fastq.gz" -2 "S1.R2.fastq.gz" | samtools view -bS - > S1.rf.bam

But the result was even worse:

40297213 reads; of these:
  40297213 (100.00%) were paired; of these:
    39453388 (97.91%) aligned concordantly 0 times
    476830 (1.18%) aligned concordantly exactly 1 time
    366995 (0.91%) aligned concordantly >1 times
    ----
    39453388 pairs aligned concordantly 0 times; of these:
      8461967 (21.45%) aligned discordantly 1 time
    ----
    30991421 pairs aligned 0 times concordantly or discordantly; of these:
      61982842 mates make up the pairs; of these:
        27210336 (43.90%) aligned 0 times
        22300934 (35.98%) aligned exactly 1 time
        12471572 (20.12%) aligned >1 times
66.24% overall alignment rate

Any idea about the increase of pairs not aligned concordantly?

I ran featureCounts v1.5.0-p3 using the first alignment

gtf=/mnt/b/references//Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf
featureCounts -a $gtf -g gene_id -T 32 -o S1.featureCounts.txt -p -s 2 S1.bam

featureCounts Summary:

Status  ES_1.bam
Assigned        8026858
Unassigned_Ambiguity    3612484
Unassigned_MultiMapping 21018807
Unassigned_NoFeatures   12584479
Unassigned_Unmapped     9612193
Unassigned_MappingQuality       0
Unassigned_FragmentLength       0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0

What may be causing the low fragment assignment (and low mapping rate)? Any recommendation?

RNA-Seq • 1.6k views

ADD COMMENT • link 5.9 years ago by said3427 ▴ 120

1

Entering edit mode

Maybe contamination, try blasting a few tens up to one hundred reads. If you have easy access (the indices are huge), you may try Kraken, Centriphuge or CLARK on the whole dataset to get a taxonomic profile of the whole data.