Low fragment assignment RNAseq
0
0
Entering edit mode
5.9 years ago
said3427 ▴ 120

Hello, I have RNAseq data (2x76 bp) from Mouse, the objective of the study is to run a standard differential analysis.

After adapter and quality trimming using Trim_galore, I aligned the reads using HISAT2 v2.1.0

hisat2index="/mnt/b/references/grcm38_snp_tran/genome_snp_tran"
hisat2 -q -p 36 -x $hisat2index -1 "S1.R1.fastq.gz" -2 "S1.R2.fastq.gz" | samtools view -bS - > "S1.bam"

Summary:

40297213 reads; of these:
  40297213 (100.00%) were paired; of these:
    21041306 (52.22%) aligned concordantly 0 times
    16592068 (41.17%) aligned concordantly exactly 1 time
    2663839 (6.61%) aligned concordantly >1 times
    ----
    21041306 pairs aligned concordantly 0 times; of these:
      703079 (3.34%) aligned discordantly 1 time
    ----
    20338227 pairs aligned 0 times concordantly or discordantly; of these:
      40676454 mates make up the pairs; of these:
        27209377 (66.89%) aligned 0 times
        9337726 (22.96%) aligned exactly 1 time
        4129351 (10.15%) aligned >1 times
66.24% overall alignment rate

The high percentage of paired reads not concordantly aligned really surprised me, any idea?

After running RSeQC v2.6.4- infer_experiment.py - I assumed that the library is reversely stranded:

This is PairEnd Data
Fraction of reads failed to determine: 0.0180
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0204
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9616

I re-run the alignment using:

hisat2 -q -p 36 --rf -x $hisat2index -1 "S1.R1.fastq.gz" -2 "S1.R2.fastq.gz" | samtools view -bS - > S1.rf.bam

But the result was even worse:

40297213 reads; of these:
  40297213 (100.00%) were paired; of these:
    39453388 (97.91%) aligned concordantly 0 times
    476830 (1.18%) aligned concordantly exactly 1 time
    366995 (0.91%) aligned concordantly >1 times
    ----
    39453388 pairs aligned concordantly 0 times; of these:
      8461967 (21.45%) aligned discordantly 1 time
    ----
    30991421 pairs aligned 0 times concordantly or discordantly; of these:
      61982842 mates make up the pairs; of these:
        27210336 (43.90%) aligned 0 times
        22300934 (35.98%) aligned exactly 1 time
        12471572 (20.12%) aligned >1 times
66.24% overall alignment rate

Any idea about the increase of pairs not aligned concordantly?

I ran featureCounts v1.5.0-p3 using the first alignment

gtf=/mnt/b/references//Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf
featureCounts -a $gtf -g gene_id -T 32 -o S1.featureCounts.txt -p -s 2 S1.bam

featureCounts Summary:

Status  ES_1.bam
Assigned        8026858
Unassigned_Ambiguity    3612484
Unassigned_MultiMapping 21018807
Unassigned_NoFeatures   12584479
Unassigned_Unmapped     9612193
Unassigned_MappingQuality       0
Unassigned_FragmentLength       0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0

What may be causing the low fragment assignment (and low mapping rate)? Any recommendation?

RNA-Seq • 1.6k views
ADD COMMENT
1
Entering edit mode

Maybe contamination, try blasting a few tens up to one hundred reads. If you have easy access (the indices are huge), you may try Kraken, Centriphuge or CLARK on the whole dataset to get a taxonomic profile of the whole data.

ADD REPLY
0
Entering edit mode

You are right, I have contamination

ADD REPLY

Login before adding your answer.

Traffic: 2308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6