Hello!
I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It's a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform.
The problem is, Novogene sent us two files containing matched VCF and XLXS files for each patient, in which sequences already underwent quality control and imputation, using GATK and ANNOVAR. Now, I don´t really now where to go from here... there is no single file that groups all patients' genotypes that can be correlated with the phenotype file.
Is there a time-efficient way to create a single VCF file or similar so that I can perform regression and obtain the p-values for the study? Is there another way? How do you usually handle this issue, if it´s something common?
I'm learning to use PLINK and already know my way with RStudio. Whatever solution you propose I'm willing to learn.
Thank you so much Dr. Guimaraes!