Question: Perl/Python script: phased vcf to phased tped
0
gravatar for Shicheng Guo
4 months ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Hi All,

Who can share a perl/python script to transfer phased vcf to phased tped?

Thanks.


Update: plink will re-order the alleles therefore 'phase' status will be broken if plink was used in the data processing. Thanks for the explanation to it: the order in which the alleles appear in heterozygous genotype calls is usually determined by which allele is major/minor in the immediate dataset; this ordering will not vary between samples

phased ped vcf • 616 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Shicheng Guo7.4k
3

Not answering your question but you have perhaps seen this.

ADD REPLYlink written 4 months ago by genomax63k
2

This is the correct 'answer'. If you care about representing genotype phase in text, use VCF.

ADD REPLYlink modified 4 months ago • written 4 months ago by chrchang5234.8k

Yes. I think there should be some wheels outside.

ADD REPLYlink written 4 months ago by Shicheng Guo7.4k

Wheels? That word does not make sense in this context. Could you explain using different words, maybe?

ADD REPLYlink written 4 months ago by RamRS20k

Quote from PLINK docs:

The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:
     Family ID
     Individual ID
     Paternal ID
     Maternal ID
     Sex (1=male; 2=female; other=unknown)
     Phenotype

PED files do not hold genotype, phased or not, information. Are you sure you're asking the right question?

ADD REPLYlink modified 4 months ago by zx87546.8k • written 4 months ago by Vitis2.0k

PED files do hold genotypes, see https://www.cog-genomics.org/plink2/formats#ped

ADD REPLYlink modified 4 months ago • written 4 months ago by zx87546.8k

There are two prevalent PED formats - the one used/generated by plink has genotype information after the first six columns. The subset of this file with the first six columns alone is used in other tools, such as GATK's PhaseByTransmission, etc, and is the more prevalent one for clinical genetics usage. PLINK calls this format the .fam file.

ADD REPLYlink written 4 months ago by RamRS20k

Oops! Got confused with the .fam files. Thanks for the info!

ADD REPLYlink written 4 months ago by Vitis2.0k
5
gravatar for chrchang523
4 months ago by
chrchang5234.8k
United States
chrchang5234.8k wrote:

plink 1.9's core only handles .bed files. So --vcf causes a temporary .bed file to be generated, which does not contain any phase information. When a ped/tped is then exported from the .bed, the order in which the alleles appear in heterozygous genotype calls is usually determined by which allele is major/minor in the immediate dataset; this ordering will not vary between samples, and has nothing to do with the original phase status.

ADD COMMENTlink written 4 months ago by chrchang5234.8k
1

Moved this comment to an answer to make it clearer that there is incorrect advice in other answers.

ADD REPLYlink written 4 months ago by jrj.healey11k

Great. Thanks Chris. Now. I see. That means plink changes the orders to keep the code for each individual is like same with in minor/major allele. Are you share about the vcftools --tped is same as what you said? Thanks.

ADD REPLYlink written 4 months ago by Shicheng Guo7.4k
1

It's basically irrelevant what vcftools --tped does, because phase is undefined in the tped format. You're effectively inventing your own file format and can't count on any software support from anyone else; much better to just write software that understands VCF, if you have to deal with text.

Meanwhile, please edit your top level answer to make it absolutely clear that it was incorrect.

ADD REPLYlink modified 4 months ago • written 4 months ago by chrchang5234.8k

Hi Chris, I think I will keep my post. I think my post is correct. Hope you can give further suggestion. I test it use 1000 genome data and use diff chr22.vcf.vcf.tped chr22.vcf.pl.tped to check the whole chr22. and it is totally same.

ADD REPLYlink modified 4 months ago • written 4 months ago by Shicheng Guo7.4k

Could it be that just your example works by coincidence, but that the implementation (which chrchang523 obviously knows better than anyone else) does not guarantee phase information is preserved?

ADD REPLYlink written 4 months ago by WouterDeCoster37k

it should be not coincidence, the whole chr22 is totally allele order (phase status) in the tped compared with vcf. Let's wait for chrchang523's further comments. We will be the destination soon.

ADD REPLYlink written 4 months ago by Shicheng Guo7.4k

Part 2 is the one that matters, and I have already explained why that can't possibly work and your test must be flawed. plink is open source, and it is straightforward to verify that (i) .bed does not store phase info and (ii) the implementation of --recode only uses (temporary) .bed as input.

If you do not edit your answer within 24 hours, I will delete it.

ADD REPLYlink written 4 months ago by chrchang5234.8k

Okay. I respect your suggestion and removed plink part. Just keep the 'vcftools --tped' part.

ADD REPLYlink written 4 months ago by Shicheng Guo7.4k

okay, maybe we can delete the whole post.

ADD REPLYlink written 4 months ago by Shicheng Guo7.4k

If you aren't going to delete the post, you need to explicitly mention that the plink test failed, after debugging your test if need be. It's the vcftools result that is meaningless, and can be deleted with no loss to anyone, since .tped is a plink file format; that's why the vcftools flag is called --plink-tped.

ADD REPLYlink written 4 months ago by chrchang5234.8k
0
gravatar for Shicheng Guo
4 months ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Done. Just Share with you guys. I conducted a test on 1000 Genome chr22.

  1. transfer phased vcf to tped, the tped will keep the phase status, rigtht? Yes. it keeps the phased status

    vcftools --vcf test.vcf --plink-tped --out out

  2. use plink to creat tped, failed, yes. plink will re-order the alleles

    plink --vcf test.vcf --tped --out out

ADD COMMENTlink modified 4 months ago • written 4 months ago by Shicheng Guo7.4k
2

This is incorrect, and you should mark it as such.

ADD REPLYlink written 4 months ago by chrchang5234.8k

Hi Chris, Can you show us some details when you coding the plinks to convert vcf to tped? Thanks. At least, from my small test dataset, I found the phase status is kept. However, it will be great if you can tell us some details about the plink when you coding. Thanks.

Let's take tped as example, since in the ped, it will be easy to shown.

ADD REPLYlink modified 4 months ago • written 4 months ago by Shicheng Guo7.4k
1

Did it work?

Thought plink could take VCF as input --vcf, --bcf?

ADD REPLYlink modified 4 months ago • written 4 months ago by zx87546.8k
1

This doesn't work; Shicheng's test was faulty.

ADD REPLYlink written 4 months ago by chrchang5234.8k

Yes. I test it, it works. plink can take --vcf and --bcf as input. But I just want to get phased status and do some further analysis with R which I hope to take 'phased ped' as input. As chris said any files created by plink will remove phase status.

ADD REPLYlink modified 4 months ago • written 4 months ago by Shicheng Guo7.4k

I have cleaned up this thread. It is good that everyone can share their opinion here, but I hope we can start fresh from now.

ADD REPLYlink written 4 months ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2502 users visited in the last hour