Question

STAR: same read is shown as 'multi-mapping' to the same location of the genome

0

Entering edit mode

7 months ago

klsywd ▴ 10

Hello, my output file is ending up with each chromosome listed twice, and also each read is being mapped twice to the same location. Any idea what could be wrong? here is an example. It's illumina 150bp paired end reads from SRA (downloaded via fastq-dump --split-files) mapped to an unannotated/unpublished genome.

SRR31736706.1   99      Chromosome23    7398268 3       150M    =       7398439 321     GNCATCACCATCGGTAACGAGAGGTTCCGTTGCCCTGAGGCTCTCTTCCAGCCTTCCTTCTTGGGTATGGAATCGTGCGGTATCCACGAGACCGTGTACAACTCCATCATGAAGTGCGACGTTGACATCCGTAAGGACCTGTACGCCAAC    I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:1  AS:i:295        nM:i:1
SRR31736706.1   147     Chromosome23    7398439 3       150M    =       7398268 -321    ACCATGTACCCCGGTATCGCCGACAGGATGCAGAAGGAGATCACCGCCCTCGCTCCCTCCACCATCAAGATCAAGAGCATCGCTCCCCCCGAGAGGAAGTACTCCGTATGGATCGGTGGATCCATCCTGGCTTCCCTCTCCACCTTCCAG    IIII99IIII9I-IIIIIIII-999-I99-I9999--I99-II9I-999-9-9-9I9-99-II--9----I-II----I--III-II9IIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:1  AS:i:295        nM:i:1
SRR31736706.1   355     Chromosome23    7398268 3       150M    =       7398439 321     GNCATCACCATCGGTAACGAGAGGTTCCGTTGCCCTGAGGCTCTCTTCCAGCCTTCCTTCTTGGGTATGGAATCGTGCGGTATCCACGAGACCGTGTACAACTCCATCATGAAGTGCGACGTTGACATCCGTAAGGACCTGTACGCCAAC    I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:2  AS:i:295        nM:i:1
SRR31736706.1   403     Chromosome23    7398439 3       150M    =       7398268 -321    ACCATGTACCCCGGTATCGCCGACAGGATGCAGAAGGAGATCACCGCCCTCGCTCCCTCCACCATCAAGATCAAGAGCATCGCTCCCCCCGAGAGGAAGTACTCCGTATGGATCGGTGGATCCATCCTGGCTTCCCTCTCCACCTTCCAG    IIII99IIII9I-IIIIIIII-999-I99-I9999--I99-II9I-999-9-9-9I9-99-II--9----I-II----I--III-II9IIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:2  AS:i:295        nM:i:1

You can see that for the same read ID, the exact same read is being mapped to the same location twice, but the second time around its flagged '355' or '403' for 'not primary alignment'

command:

STAR --genomeDir [path to genome directory] --genomeFastaFiles [path to genome.fasta]  --readFilesIn SRR31736706_1.fastq SRR31736706_2.fastq

samtools STAR RNAseq • 705 views

ADD COMMENT • link updated 7 months ago by lieven.sterck 15k • written 7 months ago by klsywd ▴ 10

2

Entering edit mode

long shot but it's not like you have double entries in your fasta file, right?

ADD REPLY • link 7 months ago by lieven.sterck 15k

1

Entering edit mode

No, there's only one entry per chromosome/scaffold in the genome fasta file and in the genomeDir

ADD REPLY • link 7 months ago by klsywd ▴ 10

0

Entering edit mode

OK, and the read is also not present twice in your input fastq file?

without knowing the exact cause, you can perhaps limit the output of STAR by using:

--outSAMprimaryFlag AllBestScore

or

--outFilterMultimapNmax 1

ADD REPLY • link 7 months ago by lieven.sterck 15k