I've come up with a homework exercise for my class on SNP mapping and benchmarking. The comical part is I can't figure out why I am not mapping the SNP myself. I'm not a novice at this anymore, but I'm banging my head against the wall to get this to work.
For the exercise, I have taken a 13,000 bp reference genomic region for an existing genome. I then changed one base in the reference in a well known gene and synthesized fake reads using
wgsim such that the reads have only one single nucleotide difference against the reference.
(I used this command to generate the reads:
wgsim -N 200000 -S 0 -e 0 -s 0 -R 0 -X 0 -1 70 -2 70 reference_with_SNP.fasta test_forward.fastq test_reverse.fastq)
Next, using the reference, I generated an index (I'm using
bwa here for the example) and then mapped the reads with the SNP to the reference without the SNP:
bwa mem reference.fasta test_forward.fastq test_reverse.fastq > test.sam. I'm getting a
SAM file with all perfect matches (i.e.
70M). I've modified a lot of the
bwa mem flags, but I can't seem to record any SNPs.
If I take a
fasta file of my mapping
fastq reads I can align the
fasta to the reference and see where the bases don't match.
Any ideas if my error is in the mapping of my SNPs (wrong
bwa mem flag) or in the read synthesis with