Benchmarking single SNP mapping
Entering edit mode
5.4 years ago
Josh Herr 5.7k

I've come up with a homework exercise for my class on SNP mapping and benchmarking. The comical part is I can't figure out why I am not mapping the SNP myself. I'm not a novice at this anymore, but I'm banging my head against the wall to get this to work.

For the exercise, I have taken a 13,000 bp reference genomic region for an existing genome. I then changed one base in the reference in a well known gene and synthesized fake reads using wgsim such that the reads have only one single nucleotide difference against the reference.

(I used this command to generate the reads: wgsim -N 200000 -S 0 -e 0 -s 0 -R 0 -X 0 -1 70 -2 70 reference_with_SNP.fasta test_forward.fastq test_reverse.fastq)

Next, using the reference, I generated an index (I'm using bwa here for the example) and then mapped the reads with the SNP to the reference without the SNP: bwa mem reference.fasta test_forward.fastq test_reverse.fastq > test.sam. I'm getting a SAM file with all perfect matches (i.e. 70M). I've modified a lot of the bwa mem flags, but I can't seem to record any SNPs.

If I take a fasta file of my mapping fastq reads I can align the fasta to the reference and see where the bases don't match.

Any ideas if my error is in the mapping of my SNPs (wrong bwa mem flag) or in the read synthesis with wgsim?

SNPs bwa mapping bowtie wgsim • 1.3k views
Entering edit mode

CIGAR M is an alignment match, not necessarily a nucleotide match. I'm not sure if bwa uses extended cigar.


Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6