STAR: same read is shown as 'multi-mapping' to the same location of the genome
0
0
Entering edit mode
7 months ago
klsywd ▴ 10

Hello, my output file is ending up with each chromosome listed twice, and also each read is being mapped twice to the same location. Any idea what could be wrong? here is an example. It's illumina 150bp paired end reads from SRA (downloaded via fastq-dump --split-files) mapped to an unannotated/unpublished genome.

SRR31736706.1   99      Chromosome23    7398268 3       150M    =       7398439 321     GNCATCACCATCGGTAACGAGAGGTTCCGTTGCCCTGAGGCTCTCTTCCAGCCTTCCTTCTTGGGTATGGAATCGTGCGGTATCCACGAGACCGTGTACAACTCCATCATGAAGTGCGACGTTGACATCCGTAAGGACCTGTACGCCAAC    I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:1  AS:i:295        nM:i:1
SRR31736706.1   147     Chromosome23    7398439 3       150M    =       7398268 -321    ACCATGTACCCCGGTATCGCCGACAGGATGCAGAAGGAGATCACCGCCCTCGCTCCCTCCACCATCAAGATCAAGAGCATCGCTCCCCCCGAGAGGAAGTACTCCGTATGGATCGGTGGATCCATCCTGGCTTCCCTCTCCACCTTCCAG    IIII99IIII9I-IIIIIIII-999-I99-I9999--I99-II9I-999-9-9-9I9-99-II--9----I-II----I--III-II9IIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:1  AS:i:295        nM:i:1
SRR31736706.1   355     Chromosome23    7398268 3       150M    =       7398439 321     GNCATCACCATCGGTAACGAGAGGTTCCGTTGCCCTGAGGCTCTCTTCCAGCCTTCCTTCTTGGGTATGGAATCGTGCGGTATCCACGAGACCGTGTACAACTCCATCATGAAGTGCGACGTTGACATCCGTAAGGACCTGTACGCCAAC    I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:2  AS:i:295        nM:i:1
SRR31736706.1   403     Chromosome23    7398439 3       150M    =       7398268 -321    ACCATGTACCCCGGTATCGCCGACAGGATGCAGAAGGAGATCACCGCCCTCGCTCCCTCCACCATCAAGATCAAGAGCATCGCTCCCCCCGAGAGGAAGTACTCCGTATGGATCGGTGGATCCATCCTGGCTTCCCTCTCCACCTTCCAG    IIII99IIII9I-IIIIIIII-999-I99-I9999--I99-II9I-999-9-9-9I9-99-II--9----I-II----I--III-II9IIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NH:i:2  HI:i:2  AS:i:295        nM:i:1

You can see that for the same read ID, the exact same read is being mapped to the same location twice, but the second time around its flagged '355' or '403' for 'not primary alignment'

command:

STAR --genomeDir [path to genome directory] --genomeFastaFiles [path to genome.fasta]  --readFilesIn SRR31736706_1.fastq SRR31736706_2.fastq
samtools STAR RNAseq • 705 views
ADD COMMENT
2
Entering edit mode

long shot but it's not like you have double entries in your fasta file, right?

ADD REPLY
1
Entering edit mode

No, there's only one entry per chromosome/scaffold in the genome fasta file and in the genomeDir

ADD REPLY
0
Entering edit mode

OK, and the read is also not present twice in your input fastq file?

without knowing the exact cause, you can perhaps limit the output of STAR by using:

--outSAMprimaryFlag AllBestScore

or

--outFilterMultimapNmax 1
ADD REPLY

Login before adding your answer.

Traffic: 3517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6