intersection of multiple VCF files
1
0
Entering edit mode
3.4 years ago
JoeDoasi ▴ 10

Hello,

While doing a variant assessment for patients exomes (~16 vcf files - 16 patients), I find that some variants predicted to be pathogenic, however, the phenotype is not associated with these variants.

I also found that some pathogenic predicted variants do exist in more than a patient! so I'm thinking of doing an intersection of all the vcf files and use file containing the common variants in my variant assessment workflow!

• Is this approach OK?
• I want to build up a database for the common variants in our population, will this strategy help?
• What are the recommended tools?

next-gen sequencing SNP genome gene • 1.7k views
1
Entering edit mode
3.4 years ago

As I understand, your current question is: is there any variant predicted to be pathogenic?

Looking at variants that are common to all patients is a good way of investigating your dataset at the population level. You will filter out variants that are only found in few patients, which will remove some noise in your data. You will be (hopefully) left with enough common variants to draw some conclusions.

I would use BCFTools isec, see here as well: Intersect multiple VCF files.

I would test the tool with only 2 VCFs just to make sure that the output is what I wanted, before running the command for the 16 VCFs.

1
Entering edit mode

A general word of caution before embarking on a bcftools journey: Run bcftools norm -m to split multi-allelics and if possible, left-align and normalize indels. Saves you a lot of headache downstream when dealing with partial overlaps with multi-allelic variants and "indels" at tandem repeat loci.

0
Entering edit mode

Thank you RamRS. I will consider it in the coming analyses

0
Entering edit mode

Thank you Emeline for your reply. I used vcftools for that but now I will use BCFTools and compare both outputs.