How to validate zero SNPs in variant calling?
0
0
Entering edit mode
6.6 years ago
Charles Yin ▴ 180

I want to compare two bacterial genomes using SNP signature. The method uses one bacterial genome as reference and aligns the reads from other bacterial to the reference genome. After variant calling, I can see some regions from reference genome do not have SNPs with the reads, i.e., the histogram of SNP counts is zero, for example, on position 367390.

To validate the zero SNPs region, I run the coverage analysis using bedtools (genomecov), it looks that this region is not covered by reads, but I am not sure if this analysis is correct. The puzzle to me is that I have run multiple sample reads from different runs, the results are consistent, and the two genomes have similar sequences from sequence alignment. The positions are expected to have reads covered.

(The bam file was from bwa alignment of reference genome and reads using default options)

$ bedtools genomecov -ibam CR36.bam  -d > output.txt
Sample entries of output.txt 
NODE_1  367376  15
NODE_1  367377  13
NODE_1  367378  12
NODE_1  367379  2
NODE_1  367380  2
NODE_1  367381  2
NODE_1  367382  2
NODE_1  367383  0
NODE_1  367384  0
NODE_1  367385  0
NODE_1  367386  0
NODE_1  367387  0
NODE_1  367388  0

My question is: When we find some regions do not have SNP in VCF, how do we validate if these regions have reads covered during alignment?

Could you please shed some lights on my puzzle? Sorry for the long question. Thank you!

SNP alignment • 1.5k views
ADD COMMENT
1
Entering edit mode

Perhaps the bases in this region are all N due to scaffolding? Alternatively, they may differ from the reference enough so that bwa can't map reads there. You did not specify whether you were using bwa-aln or bwa-mem, or the version, or command, or read length, or whether the reads were paired, or the platform, or average depth, or the library preparation, so it's difficult to speculate about why you have zero coverage in some locations; it's affected by all of those things.

ADD REPLY
0
Entering edit mode

Thank you!

The alignment used BWA with following commend:

bwa aln CP2215.fasta R36KAN_S19_1.fastq > CR36.sai

The zero SNPs on some regions are possibly due to homologous recombination between the reference genome and the genome by the raw reads. The recombination is what I expected in my experiment and I wish to confirm this recombination by SNP calling. Because different sequencing runs have similar results of SNP calling, and two genomes are similar from alignment test of complete genomes, the possibility of non-coverage of the reads on this region is low. That is my puzzle. Probably the commends in BWA alignment or bedtools coverage analysis of alignment are incorrect?

ADD REPLY
0
Entering edit mode

You can always view the aligned files in IGV to ensure reads if you only have a few specific regions to check. But yes, it appears those regions don't have reads. If you still have sample remaining, you can try Sanger sequencing the region, though again, this is really only reasonable for a handful of regions.

@Brian Bushnell's suggestion that the two genomes might be too different to align properly is also a very real possibility.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6