Question: How to calculate the SNP/Indel number between cultivars by using combined VCF file?
gravatar for r00628112
6 months ago by
r0062811210 wrote:

I would like to follow this paper to do the analysis of Genome-wide DNA polymorphisms by using four cultivars re-sequencing data. I would like to make this kind of table and picture in this paper Table 2 Frequency of SNPs and InDels detected in PSRR-1, Bengal, and Nona Bokra., Figure 1 Genome-wide discovery of DNA polymorphisms by whole genome sequencing differentiates weedy and cultivated rice Thus I'm curious about how to compare the SNP or Indel number between two cultivars in a combined VCF file.

snp indel vcf genome • 171 views
ADD COMMENTlink modified 6 months ago by David Minkley10 • written 6 months ago by r0062811210
gravatar for David Minkley
6 months ago by
University of Victoria
David Minkley10 wrote:

If I'm understanding it correctly, this boils down to first identifying all sites that are polymorphic between two cultivars (ie, an allele present in one cultivar that is absent in the other). You can filter a VCF file using expressions in the bcftools filter program based on genotypes and on polymorphism type (SNP or INDEL) to create a file that only contains the differences between cultivars of the type you are looking for. From there you can just count the number of non-header lines in the file to get the number of polymorphisms, or run window-based analyses using something like bedtools makewindows and bedtools count.

Assuming the simple case that you only have two individuals per VCF file, one of each cultivar type, and that these individuals are diploid, you can use something like this to extract all polymorphic sites:

bcftools filter --include '(FORMAT/GT[0] = "hom" && FORMAT/GT[1] = "het") || (FORMAT/GT[0] = "het" && FORMAT/GT[1] = "hom") || (FORMAT/GT[0] = "AA" && FORMAT/GT[1] = "RR") || (FORMAT/GT[0] = "RR" && FORMAT/GT[1] = "AA")' input_file.vcf > output_polymorphic_file.vcf

Separately you can pull out SNPs or INDELs using:

bcftools filter --include 'TYPE="snp"' input_file.vcf > output_file.snps_only.vcf
bcftools filter --include 'TYPE="indel"' input_file.vcf > output_file.indels_only.vcf
ADD COMMENTlink written 6 months ago by David Minkley10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 924 users visited in the last hour