Question: Combining Data Of Multiple Vcfs Into One.
2
gravatar for Sheila
5.4 years ago by
Sheila240
United States
Sheila240 wrote:

I have a number of VCF files, where each VCF file possesses variant data for a single patient (this is the way Illumina provides their data). Is it possible to combine all of the data for the patients into one VCF file? If so, how? Can I use plink/seq to do this?!

Any suggestions and leads would be extremely helpful.

vcf variant-calling • 16k views
ADD COMMENTlink modified 5.4 years ago by zx87545.4k • written 5.4 years ago by Sheila240
4
gravatar for Pierre Lindenbaum
5.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

duplicate of How can I merge a large amount of VCF files?

use vcff-tools merge

ADD COMMENTlink written 5.4 years ago by Pierre Lindenbaum113k

thanks. is it possible to do this with plink/seq too?

ADD REPLYlink written 5.4 years ago by Sheila240
3
gravatar for William
5.4 years ago by
William4.3k
Europe
William4.3k wrote:

GATK CombineVariants

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_CombineVariants.html

ADD COMMENTlink written 5.4 years ago by William4.3k

link is dead...

ADD REPLYlink written 2.7 years ago by Yahan370
2
gravatar for Malachi Griffith
5.4 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith16k wrote:

Another option is joinx:

joinx vcf-merge [OPTIONS] file1.vcf file2.vcf [file3.vcf ...]

ADD COMMENTlink written 5.4 years ago by Malachi Griffith16k

Hi Malachi,

How will joinx behave when score annotation is absent for reference calls? I have a rather large bunch of VCF files with calls for all positions (gVCF?) but the annotation is different between positions with and without a call GT:DP versus GT:AD:DP:GQ:PL

I tried with the lastest version of bcftools and it seems to merge / report multiple lines randomly.

Will joinx use the snp DP as GT DP for ref calls?

thanks!

Jack

ADD REPLYlink written 2.7 years ago by Yahan370
2
gravatar for ewre
5.4 years ago by
ewre210
United States
ewre210 wrote:

since you are operating vcf files , vcftools would be a good choice, try

vcf-merge a.vcf.gz b.vcf.gz ... >combined.vcf.gz
ADD COMMENTlink written 5.4 years ago by ewre210
2
gravatar for zx8754
5.4 years ago by
zx87545.4k
London
zx87545.4k wrote:

You can load multiple VCF to one plink/seq project, then output the project as one VCF.

pseq /path/to/project load-vcf

Given a project file has been created (/path/to/project) and contains 1 or more VCF files, this command loads these VCF files into the variant-database.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by zx87545.4k

Thanks! This is helpful! I'm having trouble loading the vcfs in to a project... these are my commands and output. Can you provide any help?

MY COMMANDS: pseq testproject new-project --resources hg18 pseq /path/to/project/testproject load-vcf --vcf /path/to/TestVCFs/*.vcf

OUTPUT: pseq error : database (/ifs/adni/pbhatt/ADNI/testproject_out/vardb) error (5) database is locked plinkseq warning: database is locked (repeated 6 times) plinkseq warning: preparing query database is locked

ADD REPLYlink written 5.4 years ago by Sheila240

PLINK/SEQ documentation is not well maintained, it took me several hours of trial and errors to load the data. Try creating new project with resources and scratch folders defined, and ensure you have Read/Write access to those folders. pseq proj1 new-project --resources /share/data/hg19 --scratch /tmp/myfolder . Try loading 1 VCF file, if works then expand on your solution. There is GoogleGroups for pseq users.

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by zx87545.4k

Thanks! Yes I've tried posting in the GoogleGroups but have received more responses here. I agree about the PLINK/SEQ documentation - it's very difficult to understand when you're new to the software.

I loaded one vcf and it works fine - the problem is when i try to load more than one vcf together it seems...I will also try creating a new project with a scratch folder as well. Just so I know, what is the purpose of a scratch folder? - I couldn't find it on the Plink/Seq website.

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Sheila240

I am guessing scratch folder is where temp files are created by PLINK/SEQ, before committing to database.

ADD REPLYlink written 5.4 years ago by zx87545.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 748 users visited in the last hour