Question

How should I deal with my RNAseq result with high % of mutiple aligned reads?

0

Entering edit mode

2.5 years ago

Simon Ahn ▴ 10

enter image description here

Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than 90% of my reads were aligned well. However, when I process bam file with featureCounts, Successfully assigned alignments rate is 10.1%

Here are more details about commands and program that I used.

fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
ENSEMBL GRCh 38 reference file created with STAR

FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

STAR --runThreadN 64 --runMode genomeGenerate --genomeDir [output folder name] \ --genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path] \ --sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]
STAR aligned (version: 2.7.9a)

STAR --runThreadN 32 --genomeDir [ENSEMBL reference file path] \ --readFilesIn [SRR1657054.fastq path] \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix [output file name]
Extract count matrix using featureCounts

featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]\ -o [output file name] \ [SRR1657054Aligned.sortedByCoord.out.bam path]

(I tried -s 1 and -s 2 also, but didn't get better result)

Q1) % of Uniquely mapped reads is 32.37%, and I thought % of successfully assigned alignments will be close to 32.37%, but only 10.1%. Why is this happening? (I used same annotation GTF file)

Q2) How should I deal with high % of reads mapped to multiple loci? According to featureCounts manual,

Multi-mapping reads will also be counted. For a multi-mapping read, all its reported alignments will be counted. The 'NH' tag in BAM/SAM input is used to detect multi-mapping reads.

So if I use -M option then one read may counted more than once? What should I do about it?

RNAseq raw-count fastq • 886 views

ADD COMMENT • link updated 2.5 years ago by Carlo Yague 8.6k • written 2.5 years ago by Simon Ahn ▴ 10

1

Entering edit mode

The dataset looks like it's total RNA so those reads are probably rRNA... why don't you check where the multi-mappers are aligning to the genome? If so then I would just proceed with unique mappers.

ADD REPLY • link 2.5 years ago by benformatics 3.9k

score 2 · Accepted Answer · 2021-10-28

% of Uniquely mapped reads is 32.37%, and I thought % of successfully assigned alignments will be close to 32.37%, but only 10.1%. Why is this happening?

Alignment of reads mapping to multiple loci create multiple bam entries (for the same read). In consequence, the proportion of uniquely mapped reads can not be compared with the proportion of successfully assigned alignment as it is not the same denominator. In your case, 60% of multi-mapping reads will account for a minimum of 80% of your alignments (more than that if they align more than twice). In consequence, it is possible that the majority of your 30% uniquely mapped reads are found in your 10% of successfully assigned alignments.