Question

Benchmarking single SNP mapping

1

Entering edit mode

6.0 years ago

Josh Herr 5.8k

I've come up with a homework exercise for my class on SNP mapping and benchmarking. The comical part is I can't figure out why I am not mapping the SNP myself. I'm not a novice at this anymore, but I'm banging my head against the wall to get this to work.

For the exercise, I have taken a 13,000 bp reference genomic region for an existing genome. I then changed one base in the reference in a well known gene and synthesized fake reads using wgsim such that the reads have only one single nucleotide difference against the reference.

(I used this command to generate the reads: wgsim -N 200000 -S 0 -e 0 -s 0 -R 0 -X 0 -1 70 -2 70 reference_with_SNP.fasta test_forward.fastq test_reverse.fastq)

Next, using the reference, I generated an index (I'm using bwa here for the example) and then mapped the reads with the SNP to the reference without the SNP: bwa mem reference.fasta test_forward.fastq test_reverse.fastq > test.sam. I'm getting a SAM file with all perfect matches (i.e. 70M). I've modified a lot of the bwa mem flags, but I can't seem to record any SNPs.

If I take a fasta file of my mapping fastq reads I can align the fasta to the reference and see where the bases don't match.

Any ideas if my error is in the mapping of my SNPs (wrong bwa mem flag) or in the read synthesis with wgsim?

SNPs bwa mapping bowtie wgsim • 1.4k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.0 years ago by Josh Herr 5.8k

2

Entering edit mode

CIGAR M is an alignment match, not necessarily a nucleotide match. I'm not sure if bwa uses extended cigar.

ADD REPLY • link 6.0 years ago by WouterDeCoster 47k