Question: How To Get Phasing Status From Vcf Files
5
gravatar for Rubal7
7.3 years ago by
Rubal7770
Rubal7770 wrote:

Hello All,

We are designing a pipeline that will take phased data as input. We will ultimately be using a phased dataset provided by another group. Until this arrives we would like to practice with some phased data. We have data in VCF files that we would like to have with phase information. So output should also be VCF format. Can anyone recommend a fast way to get phase information from, and ultimately in, VCF format. Here the emphasis is on speed, we want phased data as fast as possible as dummy data and are not concerned with error rate (this once). Thank you in advance for your comments.

Best,

Rubal

vcf genome haplotype • 8.4k views
ADD COMMENTlink modified 4.3 years ago by Biostar ♦♦ 20 • written 7.3 years ago by Rubal7770
3

When your VCF is generated by GATK, phasing is encoded in the 1|0, 0|1 format.

See: http://www.broadinstitute.org/gsa/wiki/index.php/Read-backed_phasing_algorithm

ADD REPLYlink written 7.3 years ago by Alex Paciorkowski3.3k

In what format will they supply the phased data? Are you sure is VCF? I was under the impression that VCF does not maintain phased data (alleles are swappable, no assurance of maintaining order)

ADD REPLYlink written 7.3 years ago by tiagoantao660
1

no, vcf maintains the phase. If the two genotypes are separated by a pipe (e.g. 0|1) it means that they are phased; if they are separated by a slash (e.g. 0/1), they are unphased. http://www.1000genomes.org/node/101

ADD REPLYlink written 7.3 years ago by Giovanni M Dall'Olio26k

I changed the title of your question because I understood that you are asking about how to get phasing data from vcf files. Please correct it if I am wrong.

ADD REPLYlink written 7.3 years ago by Giovanni M Dall'Olio26k

I actually meant how do I phase unphased data that is in VCF format. Sorry I was away from this post for a while. But still interested in an answer

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Rubal7770

I found this description to be the most helpful for understanding how phasing information is represented in a VCF file: http://gatkforums.broadinstitute.org/gatk/discussion/45/purpose-and-operation-of-read-backed-phasing

It has nice intuitive examples of what the file actually looks like for phased and unphased variants.

ADD REPLYlink written 3.2 years ago by Malachi Griffith17k
7
gravatar for Giovanni M Dall'Olio
7.3 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

In vcf files, if the two genotypes are separated by a pipe (e.g. 0|1) it means that they are phased; if they are separated by a slash (e.g. 0/1), they are unphased. http://www.1000genomes.org/node/101

For example:

#CHROM POS ID  REF ALT QUAL FILTER INFO FORMAT      NA00001        NA00002
20     14  rs1 G   A   9    PASS   ...  GT:GQ:DP:HQ 0|0:48:1:51,51 1/0:48:8:51,51
20     17  rs2 T   A   3    q10    ...  GT:GQ:DP:HQ 0|0:49:3:58,50 0/1:3:5:65,3
20     20  rs3 A   G   67   PASS   ...  GT:GQ:DP:HQ 1|0:21:6:23,27 0/1:2:0:18,2

The first individual (column NA00001) has phased data, because the genotypes are separed by a "|"; the second (NA00002) is unphased.

You can also use the --phased option in vcftools to extract only the individuals that have phased data (see http://vcftools.sourceforge.net/options.html )

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Giovanni M Dall'Olio26k

I have the mapping wgs data on hand and Look forward to the method which turn unphased VCF files to phased. Could you please provide a method to get unphased VCF phased?

ADD REPLYlink written 4.9 years ago by Jie Ping30

Was this ever answered?

ADD REPLYlink written 2.4 years ago by olneykimberly0

If the question is "How to phase a vcf file", there are lots of tools to do this - SHAPEIT, Eagle, Beagle, etc.

ADD REPLYlink written 19 months ago by raewynhui0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1118 users visited in the last hour