Fast genotyping for PRS calculation
0
0
Entering edit mode
7 months ago

Dear community members,

I have a lot of variants for genotyping (>6 millions) and a lot of WGS samples (represented as BAM and VCF files).

My strategy for genotyping before was to read the list of variants and then iterate through VCF files, using a custom Python script. However I anticipate it will work very slow for such a huge number of samples.

Is there a way to quickly genotype a huge WGS cohort? Should I use BAM or VCF files for that?

Another issue is that VCF are called in GRCh38 and the variants for genotyping are in hg19, so for some variants where reference allele was changed in GRCh38 VCFs could be not enough, but this is a minor problem...

PRS • 248 views
ADD COMMENT
1
Entering edit mode

liftover your list of variants, split your list of variants per regions of XXX variants to call the BAMs in GVCF mode with GATK. Combine and Genotype the GVCFs , concatenate each region. use a workflow manager to run everything in parallel.

ADD REPLY
0
Entering edit mode

Thanks a lot! I am not very used to GATK infrastructure, but I guess it is time to learn =)

ADD REPLY

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6