Question: RNAseq variant calling: STAR align promotes mismatched overhanging mappings onto introns instead of calling splicing junctions
0
gravatar for Francesco
3 months ago by
Francesco10
Erasmus MC (NL)
Francesco10 wrote:

Hi, I'm doing variant calling from my RNAseq data on a cell line (Paired End 2X100b, Illumina) I'm following GATK workflows and everything is working almost fine since I can find all the expected mutations already characterized in my cell line. The problem is that I see some additional mutations that are clearly the result of an alignment error (e.g. NM_001364837:exon8:c.727+1G>A, rs781065280) I'm using the latest release of STAR aligner 2.7.6a to perform the alignment with the following options

STAR --runThreadN 8 --genomeDir $GENOME_DIR \
  --readFilesIn ...   --readFilesCommand gunzip -c \
  --outSAMattrRGline ... \
  --outSAMtype BAM SortedByCoordinate \
  --twopassMode Basic \
  --outSAMmapqUnique 60 \
  --outFilterType BySJout --alignIntronMin 20 \
  --alignIntronMax 1000000 --alignMatesGapMax 1000000 \
  --alignSJoverhangMin 8 --alignSJDBoverhangMin 3 --scoreGenomicLengthLog2scale 0 \
  --outFileNamePrefix $OUTPUT_DIR/...

The problem resides in STAR mapping some reads with a mismatching overhang of 4 bp at beginning of the intron instead of picking a perfect match at the beginning of the next exon (see picture attached). enter image description here

As you can see, --alignSJDBoverhangMin is set to 3 and the length of the overhang is 4bp. Also the splice junction should be canonical so a spliced mapping should carry no penalty. Anybody has an explanation for that?

ADD COMMENTlink written 3 months ago by Francesco10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1114 users visited in the last hour