Question: Merging and converting multiple vcf files into a SNP array
0
gravatar for Arko
12 months ago by
Arko20
US/Boston/Boston University
Arko20 wrote:

I have 50 .VCF files each corresponding to a patient sample and what I want to do is to merge all these files together, extract based on chromosome position / SNP ID for the Genotype information and then convert it into a 012 matrix in the most time efficient and effective way possible. VCF tools and BCF tools are capable of doing so but I'm trying to automate this so I'm trying to script it in Python or R possibly.

I wouldn't want duplicated SNPs over different samples (files) either, so the idea is to get a SNP array with column names as sample IDs extracted from file names and the row names as chromosome positions /SNP IDs.

R bcf python vcf • 796 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by Arko20
1

What you want just sounds like a multisample VCF file without the metadata headers. Why not just call the necessary vcftools command from within Python or R?

ADD REPLYlink written 12 months ago by jared.andrews072.7k

A "SNP array" is usually an oligonucleotide microarray for calling millions of SNPs. Probably not the same as what you have in mind, but confusing nonetheless.

ADD REPLYlink written 12 months ago by WouterDeCoster40k

All things considered, what would be the fastest way to merge GVCF files and VCF files efficiently? BCF tools is a faster alternative when compared to VCF tools but doesn't work with GVCF files.

ADD REPLYlink modified 12 months ago • written 12 months ago by Arko20
1

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 12 months ago by WouterDeCoster40k

combinegvcfs walker from gatk (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php) allows combining gvcfs

ps: please move this post to comment to OP or make it a new post.

ADD REPLYlink modified 12 months ago • written 12 months ago by cpad011211k

GATK doesn't allow merging of VCF and gVCF files unfortunately. My aim is to obtain a single VCF file from the entire set,

ADD REPLYlink written 12 months ago by Arko20

did you try bcftools merge with -g option ?

ADD REPLYlink written 12 months ago by cpad011211k

Tried it, but BCF tools on merge considers the NON - REF as a literal allele call instead of ignoring it and a NON-REF contributes to the genotype call.

ADD REPLYlink written 12 months ago by Arko20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1063 users visited in the last hour