Identifying unique SNPs between species
Entering edit mode
22 months ago
Wilber0x ▴ 50

I have 8 samples each with a VCF file made from aligning a genome skim of that sample to the same reference. I can see the number of SNPs for each sample in their individual VCF files, but I want to know how many SNPs and which SNPs are unique to each sample.

Can I merge the VCF files into one to do this? If so how?

I used bowtie2 for the alignment.

snp vcf alignment • 921 views
Entering edit mode
22 months ago
guillaume.rbt ▴ 830

I've dealed with the same type of analysis.

My solution was to do a pooled genotyping thanks to the GATK "best practices" pipeline, to obtain a single multi-sample VCF with the called genotypes of all my samples.

Then I filtered this VCF to obtain SNPs unique to each sample, for that you can use "SnpSift filter". (, with the isVariant() and isRef() functions.

Entering edit mode
22 months ago

Merging is a good idea. You can do it like this:

1. bgzip all vcf files

$ parallel bgzip -c {} > {}.gz ::: *.vcf

2. tabix index these files

$ parallel tabix {} ::: *.vcf.gz

3. create a list of compressed vcf files

$ find -maxdepth 1 -iname "*.vcf.gz" > samples.txt

4. merge files, normalize and split multiallelic variants

$ bcftools merge -l samples.txt -Ou | bcftools norm -f ref.fa -m - -o merged.vcf

5. filter merge vcf file for sites, where only one sample has at least one ALT allele and create a new vcf file for each sample with its private variants

$ parallel "bcftools view -i 'COUNT(GT=\"alt\") = 1' merged.vcf | bcftools view -x -s {} -o {}.privat.vcf" ::: `bcftools query -l merged.vcf`

Login before adding your answer.

Traffic: 1307 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6