Merging several vcf files for GWAS?
Entering edit mode
3 months ago
sabrilo171 ▴ 10


I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It's a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform.

The problem is, Novogene sent us two files containing matched VCF and XLXS files for each patient, in which sequences already underwent quality control and imputation, using GATK and ANNOVAR. Now, I don´t really now where to go from here... there is no single file that groups all patients' genotypes that can be correlated with the phenotype file.

Is there a time-efficient way to create a single VCF file or similar so that I can perform regression and obtain the p-values for the study? Is there another way? How do you usually handle this issue, if it´s something common?

I'm learning to use PLINK and already know my way with RStudio. Whatever solution you propose I'm willing to learn.

R beginner gwas ngs novogene • 484 views
Entering edit mode
3 months ago

You can use bcftools merge to merge the multiple VCFs:

bcftools merge -o merged.vcf.gz sample1.vcf.gz sample2.vcf.gz

Then convert to plink2 format with:

plink2 --vcf vcffile --out name

You will need to prepare your phenotype file to run something like this:

#IID  qt1    bmi    site
1110  2.3    22.22  site2

Finally run the regression and get the p-values for your study.

plink2 --pfile name --glm

Entering edit mode

Thank you so much Dr. Guimaraes!


Login before adding your answer.

Traffic: 1257 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6