Plink: .vcf to .ped issue (problem with polyploidy?)
1
0
Entering edit mode
3 months ago
Alice • 0

Hello,

I have a .vcf file produced with GATK, containing SNP variants of 32 samples of tetraploid Triticum species. I would like to convert it with plink in order to get the the .ped file to perform some downstream analysis (PCA, Admixture).

Here is the command I used:

plink --vcf all_triticum_SNPs_250kb.vcf --allow-extra-chr --recode --make-bed --out all_triticum_SNPs_250kb_plink


I got no error and the files are created but the .ped has actually all "0" inside. Might it be a problem with ploydy? Here is how one site looks like in my vcf, with the GT field having 4 values:

Chr1A   51532915        .       T       C       8409.13 PASS    AC=84;AF=1;AN=116;BaseQRankSum=0.579;DP=1899;FS=0;MLEAC=4;MLEAF=1;MQ=60;MQRankSum=0;QD=26.62;ReadPosRankSum=0.012;SOR=1.112     GT:AD:DP:GQ     1/1/1/1:0,49:49:61      0/1/1/1:9,42:51:47      0/0/0/1:129,50:179:76   0/0/0/1:42,13:55:35     0/0/1/1:47,39:86:35     0/1/1/1:19,33:52:1      0/0/1/1:19,23:42:17     0/0/1/1:18,20:38:18     0/0/1/1:27,22:49:19     1/1/1/1:0,128:128:99    1/1/1/1:0,123:123:99    1/1/1/1:0,74:74:92
1/1/1/1:0,40:40:50      1/1/1/1:0,55:55:69      1/1/1/1:0,82:82:99      0/0/1/1:49,35:84:19     1/1/1/1:0,50:50:62
1/1/1/1:0,45:45:56      1/1/1/1:0,66:66:82      1/1/1/1:0,61:61:76      1/1/1/1:0,45:45:56      1/1/1/1:0,198:198:99
0/0/0/1:9,3:12:7        0/1/1/1:10,25:35:14     0/0/0/1:18,3:21:23      0/0/1/1:22,18:40:15     ./././.:.:.:.   0/0/0/1:26,2:28:40      ./././.:.:.:.   0/0/1/1:21,30:51:10     1/1/1/1:0,42:42:52      ./././.:.:.:.


I tried to look for any option to set ploidy but I could not find any answer.

Many thanks for any suggestion you might give me.

Alice

polypoidy Plink .ped • 372 views
1
Entering edit mode

Your suspicion is correct, plink does not directly support tetraploid data. You will probably need to use other software packages to help analyze it.

0
Entering edit mode
6 weeks ago
yzliu01 ▴ 10

Hi Alice, could you find a solution to this issue? I also have a similar issue of converting vcf file of mixed haploid and diploid samples to .ped or .bed file so as to perform PCA and admixture analysis. Also, the reference genome has hundreds of contigs. It seems that plink is not able to tackle this problem. Anyone has an alternative way get this done? Appreciate for any help!

0
Entering edit mode

Hi, I could not find any software that can handle vcf with different ploidy. In my case, just to perform PCA, I extracted the GT field and converted in numbers from 0 to 4, "scoring" the umber of 1, e.g. 0/0/0/1 becomes 1, 0/1/1/0 becomes 2 and so on, then I computed the distance matrix on this "0-4" dataset and something resonable came out. I have no hint for the admixture instead.. Hope this can help!

0
Entering edit mode

plink can handle "hundreds of contigs"; see the --allow-extra-chr flag. Please explain what you mean by "mixed haploid and diploid samples"; plink does have appropriate logic for handling chrX/chrY.