Can I use a subset of VCF my files for quality checks?
0
0
Entering edit mode
3.2 years ago
floortje • 0

Dear all,

I have about 3,500 VCF files (each of approximately 500mb) containing whole-genome sequencing data of 3,500 individuals. The files contain all variants mapped to reference genome GRCH37. I am only interested in 300 SNPS for my final analysis so I plan to filter the VCF files to a workable format in the end which should be doable.

Before I can start these analyses however, I want to do quality checks on relatedness, population stratification, sex, relatedness and call rate. For population stratification and relatedness, I believe I need to merge all VCF files to one file, which would yield an unworkable VCF file of 1.5TB. My idea was to:

  • Open the VCF file of an individual --> check sex and call rate --> shrink the file of 500mb to 50mb
  • Do this for all individuals
  • Merge all 3,500 files of 50 mb to one large VCF and use that for the quality checks on relatedness and population stratification.

Does anyone know how to reduce the size of VCF files in a consistent way so that the files still can be used for the last quality checks? Is this doable? Will it lead to valid quality checks? Any advice on this workflow would be very welcome. Please let me know when this is not entirely clear.

quality checks relatedness vcf • 594 views
ADD COMMENT

Login before adding your answer.

Traffic: 2996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6