Question: Merging and converting multiple vcf files into a SNP array
0
gravatar for Arko
2.1 years ago by
Arko30
US/Boston/Boston University
Arko30 wrote:

I have 50 .VCF files each corresponding to a patient sample and what I want to do is to merge all these files together, extract based on chromosome position / SNP ID for the Genotype information and then convert it into a 012 matrix in the most time efficient and effective way possible. VCF tools and BCF tools are capable of doing so but I'm trying to automate this so I'm trying to script it in Python or R possibly.

I wouldn't want duplicated SNPs over different samples (files) either, so the idea is to get a SNP array with column names as sample IDs extracted from file names and the row names as chromosome positions /SNP IDs.

R bcf python vcf • 1.3k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Arko30
1

What you want just sounds like a multisample VCF file without the metadata headers. Why not just call the necessary vcftools command from within Python or R?

ADD REPLYlink written 2.1 years ago by jared.andrews076.4k

A "SNP array" is usually an oligonucleotide microarray for calling millions of SNPs. Probably not the same as what you have in mind, but confusing nonetheless.

ADD REPLYlink written 2.1 years ago by WouterDeCoster44k

All things considered, what would be the fastest way to merge GVCF files and VCF files efficiently? BCF tools is a faster alternative when compared to VCF tools but doesn't work with GVCF files.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Arko30
1

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 2.1 years ago by WouterDeCoster44k

combinegvcfs walker from gatk (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php) allows combining gvcfs

ps: please move this post to comment to OP or make it a new post.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by cpad011213k

GATK doesn't allow merging of VCF and gVCF files unfortunately. My aim is to obtain a single VCF file from the entire set,

ADD REPLYlink written 2.1 years ago by Arko30

did you try bcftools merge with -g option ?

ADD REPLYlink written 2.1 years ago by cpad011213k

Tried it, but BCF tools on merge considers the NON - REF as a literal allele call instead of ignoring it and a NON-REF contributes to the genotype call.

ADD REPLYlink written 2.1 years ago by Arko30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour