Question

How to deal with high % of reads mapped to multiple loci on STAR?

0

Entering edit mode

2.5 years ago

Simon Ahn ▴ 10

enter image description here

Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than 90% of my reads were aligned well. However, when I process bam file with featureCounts, Successfully assigned alignments rate is 10.1%

Here are more details about commands and program that I used.

fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
ENSEMBL GRCh 38 reference file created with STAR

FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

STAR --runThreadN 64 --runMode genomeGenerate --genomeDir [output folder name] \ --genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path] \ --sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]
STAR aligned (version: 2.7.9a)

STAR --runThreadN 32 --genomeDir [ENSEMBL reference file path] \ --readFilesIn [SRR1657054.fastq path] \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix [output file name]
Extract count matrix using featureCounts

featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]\ -o [output file name] \ [SRR1657054Aligned.sortedByCoord.out.bam path]

(I tried -s 1 and -s 2 also, but didn't get better result)

Q1) % of Uniquely mapped reads is 32.37%, and I thought % of successfully assigned alignments will be close to 32.37%, but only 10.1%. Why is this happening? (I used same annotation GTF file)

Q2) How should I deal with high % of reads mapped to multiple loci? According to featureCounts manual,

Multi-mapping reads will also be counted. For a multi-mapping read, all its reported alignments will be counted. The 'NH' tag in BAM/SAM input is used to detect multi-mapping reads.

So if I use -M option then one read may counted more than once? What should I do about it?

RNAseq fastq raw-count • 1.3k views

ADD COMMENT • link 2.5 years ago by Simon Ahn ▴ 10

0

Entering edit mode

Can you provide more information on what is being sequenced and the code you used?

ADD REPLY • link 2.5 years ago by rpolicastro 13k

0

Entering edit mode

I really appreciate your help. Here are some specific details.

fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
ENSEMBL GRCh 38 reference file created with STAR

FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

STAR --runThreadN 64 --runMode genomeGenerate --genomeDir [output folder name] \ --genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path] \ --sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]
STAR aligned (version: 2.7.9a)

STAR --runThreadN 32 --genomeDir [ENSEMBL reference file path] \ --readFilesIn [SRR1657054.fastq path] \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix [output file name]
Extract count matrix using featureCounts

featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]\ -o [output file name] \ [SRR1657054Aligned.sortedByCoord.out.bam path]

(I tried -s 1 and -s 2 also, but didn't get better result)

ADD REPLY • link 2.5 years ago by Simon Ahn ▴ 10