I have generated VCF files from a range of samples of my study organism using a published reference genome. I now want to use these to assess where diversity is along a given chromosome; in other words I want to know the density of SNPs along chromosome 2.
One way I had thought of doing this was to split the chromosome into 10Kb chunks, count the number of SNPs in each one, and plot this in a chart. So what I want is basically:
Chr2:1-10000, SNPs = X Chr2:10000-20000, SNPs = Y etc.
But I can't figure out an efficient way of actually extracting this information from my VCF files.I am able to use the command
tabix -h myfile.vcf.gz chr2:[10Kregion] > myoutput.vcf
to extract a given region of any VCF file and then use bcftools stats to count the number of SNPs therein, but the chromosome is 16Mb long; there must be a more efficient way to do this than for me to manually extract 1,600 VCF files and examine them with bcftools stats.
I wonder if there's some software that could help with this. I have Integrated Genome Viewer, MEGA and SeaView, but if any of them have this kind of functionality, I haven't found it yet.
Otherwise, any help or advice would be much appreciated. It must be doable because I see plenty of published papers with SNP density plotted against chromosome position, but I can't figure out how to do it myself.
Thanks in advance!
I have tried using bedtools coverage and get errors at several lines of the file:
The resulting output gives a count of 0 at every window. Can anyone help with this?
Hi Sacha, I also want to do something similar but I for each sample from a multisample vcf file. I tried extracting a single sample from the multi-vcf file and then followed your command but the number of variants counts for different samples remain the same (probably these are the sites where a variant has been being called across the samples). Could you guide me to extract unique variants for each individuate and then plot the variants in a bin size.
Greetings Sacha, just out of curiosity, I want to do something similar to your solution but for a non-model organism (in my field usually is used Circos but visually I find more pleasant your solution) is it possible to do so? And eventually how can I build the ideo file my self? Thank you in advance