Question: All the reads aligned using STAR have low mapping quality (0-3). What is happening?
1
gravatar for kirannbishwa01
3.2 years ago by
kirannbishwa011.1k
United States
kirannbishwa011.1k wrote:

I used STAR for mapping my RNAseq data using STAR. I had used 2 pass mapping and had supplied GTF files. I am also interested in how the reads gets aligned to transcriptome. So, I used the following script during 2nd pass mapping and the script for 1st pass mapping is similar:

STAR --runThreadN 16 --runMode alignReads --genomeDir 2nDpassGenomeDir --readFilesIn Trm_PE-2ms01e_R1.fastq Trm_PE-2ms01e_R2.fastq --outFileNamePrefix 2nDpassAlignment/2ms01e/2ms01e --outFilterMultimapNmax 10 --outSAMmapqUnique Integer0to255 --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --outSAMattributes All --alignIntronMin 10 --quantMode TranscriptomeSAM GeneCounts

  • When I look for the aligned bam file from STAR on IGV I was surprised to see that all the reads have very low mapping quality (0-3), infact reads with mapQ 0 are above 80%. The rest are mapQ 1 and 3. And this is the case for every gene.

  • I also ran awk to capture any reads with mapping quality above 5 and there were none.

  • Also, the bam file generated by aligning RNAseq reads to the transcriptome has lots of read with mapping quality 0 (zero).

For confirmation I compared the aligned genome sequence data (aligned using BWA from several samples) at the same regions for several genes and I see that there are lost of reads with mapQ above 40. Also, the reads from RNAseq and Genome sequence data share several variants at the same site.

What is causing the reads from RNAseq alignment using STAR to the same position to have such a low mapping quality?

Is something wrong with the script?

Is something different with how STAR assigns mapping quality?

or, is there some other problem?

Thanks in advance, - B

ADD COMMENTlink written 3.2 years ago by kirannbishwa011.1k
1

Did you take a small sample of reads and blasted them at NCBI to make sure they look correct (right genome)?

ADD REPLYlink written 3.2 years ago by genomax71k

Genome is the right one. Its the same genome I used to align the genome resequence data using BWA.

The genome reseq data aligned by BWA at the same position have high mapping quality (above 40).

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by kirannbishwa011.1k

According to a recent blog post by Dr. Simon Andrews, STAR uses the same MAPQ scores as TopHat which would mean that all those reads are multi-mapping which explains why they have MAPQ scores of 0-3 (mapping from 2 to 10 or more locations).

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax71k

Thanks genomax2. But, there should still be some uniquely mapping reads when I am using the whole genome. There is not one read that aligned and has mapping quality above 3 (this is the highest).

I am little surprised and confused why should the scoring of mapQ value be so different.

ADD REPLYlink written 3.1 years ago by kirannbishwa011.1k
1

Don't literally use Integer0to255 as the value for --outSAMmapqUnique. Either use a single value, e.g., --outSAMmapqUnique 50, or don't bother with the argument (the default is 255). It's extremely likely that Integer0to255 caused the "unique alignment" MAPQ to be set to 0.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Devon Ryan92k

I think that could be the case. I will change the value for --outSAMmapqUnique and will update it on the forum.

Thanks,

ADD REPLYlink written 3.1 years ago by kirannbishwa011.1k

Good observation. I see in the code where the integer range is tested but not the argument type. I don't know enough C/C++ to say what the behaviour would be in this case but your comment makes sense to me (and I'm sure you know more about that).

ADD REPLYlink written 3.1 years ago by SES8.2k

I won't profess to being a C++ aficionado, so I'm also not sure exactly how that's getting handled. I'm curious if this solves the problem though.

ADD REPLYlink written 3.1 years ago by Devon Ryan92k
1

@Devon @ SES @ Wouter: Using --outSAMmapqUnique 60 solved the issue. But, there are reads with either mapQ value 60 or 0, 1, 2, 3. I hope STAR will do a better job of assigning mapping quality.

Also, it is stated in STAR manual to use --outSAMmapqUnique Interger0to255 for downstream GATK compatibility. I think this statement could be update to avoid any confusion.

The statement could be better written as --outSAMmapqUnique X - where X is an desired integer value from 0 to 255.

Personally, for me --outSAMmapqUnique Interger0to255 is confusing.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by kirannbishwa011.1k

The manual also states (page 26):

--outSAMmapqUnique

default: 255

int: 0 to 255: the MAPQ value for unique mappers

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by WouterDeCoster40k

What does the quality distribution of the FASTQ data look like? If that is good then you can debug the alignments. If not, then you'll want to go back and figure out what happened with the trimming or even with the library prep if the raw data is no good.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by SES8.2k

The FastQ data quality looks good. There were some enriched kmers even after quality trimming but not too much. But, based on the previous discussion on biostars I used the data (after it was trimmed).

I am posting some of the quality checks from fastqc after the data was trimmed.

per base seq quality

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by kirannbishwa011.1k

Could you have a look at the mapping qualities in the first pass using STAR?

ADD REPLYlink written 3.1 years ago by WouterDeCoster40k

Its the same. I think the problem is with the scripts.

ADD REPLYlink written 3.1 years ago by kirannbishwa011.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour