Question: Combining Data Of Multiple Vcfs Into One.
3
gravatar for Sheila
6.1 years ago by
Sheila300
United States
Sheila300 wrote:

I have a number of VCF files, where each VCF file possesses variant data for a single patient (this is the way Illumina provides their data). Is it possible to combine all of the data for the patients into one VCF file? If so, how? Can I use plink/seq to do this?!

Any suggestions and leads would be extremely helpful.

vcf variant-calling • 19k views
ADD COMMENTlink modified 6 months ago by Shicheng Guo7.6k • written 6.1 years ago by Sheila300
5
gravatar for William
6.1 years ago by
William4.4k
Europe
William4.4k wrote:

GATK CombineVariants, see:

From the above link usage examples:

Merge two separate callsets

java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant input1.vcf \
   --variant input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions UNIQUIFY

Get the union of calls made on the same samples

 java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant:foo input1.vcf \
   --variant:bar input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions PRIORITIZE \
   -priority foo,bar
ADD COMMENTlink modified 6 months ago by zx87547.8k • written 6.1 years ago by William4.4k
4
gravatar for Pierre Lindenbaum
6.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Related duplicate post

Use vcf-merge

ADD COMMENTlink modified 6 months ago by zx87547.8k • written 6.1 years ago by Pierre Lindenbaum121k

Thanks. Is it possible to do this with plink/seq too?

ADD REPLYlink modified 6 months ago by RamRS22k • written 6.1 years ago by Sheila300
2
gravatar for Malachi Griffith
6.1 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Another option is joinx:

joinx vcf-merge [OPTIONS] file1.vcf file2.vcf [file3.vcf ...]
ADD COMMENTlink modified 6 months ago by RamRS22k • written 6.1 years ago by Malachi Griffith17k

Hi Malachi,

How will joinx behave when score annotation is absent for reference calls? I have a rather large bunch of VCF files with calls for all positions (gVCF?) but the annotation is different between positions with and without a call GT:DP versus GT:AD:DP:GQ:PL

I tried with the lastest version of bcftools and it seems to merge / report multiple lines randomly.

Will joinx use the snp DP as GT DP for ref calls?

thanks!

Jack

ADD REPLYlink modified 6 months ago by RamRS22k • written 3.5 years ago by Yahan370
2
gravatar for ewre
6.1 years ago by
ewre220
United States
ewre220 wrote:

Since you are operating vcf files, vcftools would be a good choice, try

vcf-merge a.vcf.gz b.vcf.gz ... > combined.vcf.gz
ADD COMMENTlink modified 6 months ago by RamRS22k • written 6.1 years ago by ewre220
2
gravatar for zx8754
6.1 years ago by
zx87547.8k
London
zx87547.8k wrote:

You can load multiple VCF to one plink/seq project, then output the project as one VCF.

pseq /path/to/project load-vcf

Given a project file has been created (/path/to/project) and contains 1 or more VCF files, this command loads these VCF files into the variant-database.

ADD COMMENTlink modified 6 months ago by RamRS22k • written 6.1 years ago by zx87547.8k

Thanks! This is helpful! I'm having trouble loading the vcfs in to a project... these are my commands and output. Can you provide any help?

MY COMMANDS:

pseq testproject new-project --resources hg18
pseq /path/to/project/testproject load-vcf --vcf /path/to/TestVCFs/*.vcf

OUTPUT:

pseq error : database (/ifs/adni/pbhatt/ADNI/testproject_out/vardb) error (5) database is locked
plinkseq warning: database is locked (repeated 6 times)
plinkseq warning: preparing query database is locked
ADD REPLYlink modified 6 months ago by RamRS22k • written 6.1 years ago by Sheila300

PLINK/SEQ documentation is not well maintained, it took me several hours of trial and errors to load the data. Try creating new project with resources and scratch folders defined, and ensure you have Read/Write access to those folders.

pseq proj1 new-project --resources /share/data/hg19 --scratch /tmp/myfolder.

Try loading 1 VCF file, if works then expand on your solution. There is GoogleGroups for pseq users.

ADD REPLYlink modified 6 months ago by RamRS22k • written 6.1 years ago by zx87547.8k

Thanks! Yes I've tried posting in the GoogleGroups but have received more responses here. I agree about the PLINK/SEQ documentation - it's very difficult to understand when you're new to the software.

I loaded one vcf and it works fine - the problem is when i try to load more than one vcf together it seems...I will also try creating a new project with a scratch folder as well. Just so I know, what is the purpose of a scratch folder? - I couldn't find it on the Plink/Seq website.

ADD REPLYlink modified 6 months ago by RamRS22k • written 6.1 years ago by Sheila300

I am guessing scratch folder is where temp files are created by PLINK/SEQ, before committing to database.

ADD REPLYlink modified 6 months ago by RamRS22k • written 6.1 years ago by zx87547.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 541 users visited in the last hour