How to deal with high % of reads mapped to multiple loci on STAR?
0
0
Entering edit mode
2.5 years ago
Simon Ahn ▴ 10

enter image description here

Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than 90% of my reads were aligned well. However, when I process bam file with featureCounts, Successfully assigned alignments rate is 10.1%

Here are more details about commands and program that I used.

  1. fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
  2. ENSEMBL GRCh 38 reference file created with STAR

    FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

    GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

    STAR --runThreadN 64 --runMode genomeGenerate --genomeDir [output folder name] \ --genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path] \ --sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]

  3. STAR aligned (version: 2.7.9a)

    STAR --runThreadN 32 --genomeDir [ENSEMBL reference file path] \ --readFilesIn [SRR1657054.fastq path] \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix [output file name]

  4. Extract count matrix using featureCounts

    featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]\ -o [output file name] \ [SRR1657054Aligned.sortedByCoord.out.bam path]

    (I tried -s 1 and -s 2 also, but didn't get better result)

Q1) % of Uniquely mapped reads is 32.37%, and I thought % of successfully assigned alignments will be close to 32.37%, but only 10.1%. Why is this happening? (I used same annotation GTF file)

Q2) How should I deal with high % of reads mapped to multiple loci? According to featureCounts manual,

Multi-mapping reads will also be counted. For a multi-mapping read, all its reported alignments will be counted. The 'NH' tag in BAM/SAM input is used to detect multi-mapping reads.

So if I use -M option then one read may counted more than once? What should I do about it?

RNAseq fastq raw-count • 1.3k views
ADD COMMENT
0
Entering edit mode

Can you provide more information on what is being sequenced and the code you used?

ADD REPLY
0
Entering edit mode

I really appreciate your help. Here are some specific details.

  1. fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
  2. ENSEMBL GRCh 38 reference file created with STAR

    FASTA: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

    GTF: http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

    STAR --runThreadN 64 --runMode genomeGenerate --genomeDir [output folder name] \ --genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path] \ --sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]

  3. STAR aligned (version: 2.7.9a)

    STAR --runThreadN 32 --genomeDir [ENSEMBL reference file path] \ --readFilesIn [SRR1657054.fastq path] \ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts \ --outFileNamePrefix [output file name]

  4. Extract count matrix using featureCounts

    featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]\ -o [output file name] \ [SRR1657054Aligned.sortedByCoord.out.bam path]

    (I tried -s 1 and -s 2 also, but didn't get better result)

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6