Question

Merging several vcf files for GWAS?

0

Entering edit mode

11 months ago

sabrilo171 ▴ 10

Hello!

I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It's a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform.

The problem is, Novogene sent us two files containing matched VCF and XLXS files for each patient, in which sequences already underwent quality control and imputation, using GATK and ANNOVAR. Now, I don´t really now where to go from here... there is no single file that groups all patients' genotypes that can be correlated with the phenotype file.

Is there a time-efficient way to create a single VCF file or similar so that I can perform regression and obtain the p-values for the study? Is there another way? How do you usually handle this issue, if it´s something common?

I'm learning to use PLINK and already know my way with RStudio. Whatever solution you propose I'm willing to learn.

R beginner gwas ngs novogene • 895 views

ADD COMMENT • link 11 months ago by sabrilo171 ▴ 10

score 1 · Answer 1 · 2023-11-23

1

Entering edit mode

11 months ago

Raony Guimarães ★ 1.4k

You can use bcftools merge to merge the multiple VCFs:

bcftools merge -o merged.vcf.gz sample1.vcf.gz sample2.vcf.gz

Then convert to plink2 format with:

plink2 --vcf vcffile --out name

You will need to prepare your phenotype file to run something like this:

#IID  qt1    bmi    site
1110  2.3    22.22  site2

Finally run the regression and get the p-values for your study.

plink2 --pfile name --glm

https://www.cog-genomics.org/plink/2.0/input#pheno

https://www.cog-genomics.org/plink/2.0/assoc

ADD COMMENT • link 11 months ago by Raony Guimarães ★ 1.4k

1

Entering edit mode

Thank you so much Dr. Guimaraes!

ADD REPLY • link 11 months ago by sabrilo171 ▴ 10