Question: Many 0 in ped file converted from vcf genomic data
0
gravatar for shawn
5 weeks ago by
shawn20
shawn20 wrote:

Hi everyone,

I am learning do the gwas analysis. When I convert the genomic data "1001genomes_snp-short-indel_only_ACGTN.vcf.gz" download from here to plink ped format.

plink --vcf 1001genomes_snp-short-indel_only_ACGTN.vcf.gz--make-bed --out 1001genomes_snp-short-indel_only_ACGTN.vcf.gz

I find there are many 0 in the ped file like this:

88 88 0 0 0 -9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 T T 0 0 G G C C T T 0 0 0 0 T T G G 0 0 T T T T A A A A T T 0 0 T T 0 0 G G C C A A T T A A C C C C C C A A T T C C T T G G G G T T 0 0 C C G G G G T T T T T T A A T T C C G G G G G G C C C C G G G G G G G G C C C C G G C C T T T T G G A A C C T T A A G G 0 0 G G A A T T A A 0 0 0 0 C C C C T T G G G G G G A A T T 0 0 0 0 A A G G T T T T G G 0 0 C C T T C C 0 0 A A C C C C G G G G A A G G C C C C G G C C G G C C C C C C G G G G G G G G A A 0 0 C C C C A A C C C C C C G G C C C C C C C C C C C C C C G G T T C C C C C C C C A A 0 0 A A T T 0 0 T T T T T T A A G G T T G G G G T T C C G G G G C C G G C C 0 0 C C C C T T T T T T A A T T T T G G A A G G C C C C G G 0 0 G G C C G G T T T T C C 0 0 G G A A 0 0 C C G G C C T T 0 0 T T C C A A G G 0 0 C C A A A A G G C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

And when I do the quality control

 plink --bfile 1001genomes_snp-short-indel_only_ACGTN --maf 0.05 --geno 0.02 --mind 0.02 --hwe 1e-6 --make-bed --out snp

it showed "Error: All people removed due to missing genotype data (--mind)". Does anyone know the reason? Do I choose the wrong dataset or I made some mistake? Thanks a lot.

snp plink gwas vcf • 171 views
ADD COMMENTlink modified 5 weeks ago by zx87546.8k • written 5 weeks ago by shawn20

Please use the formatting bar (10101) to highlight code and data examples.

ADD REPLYlink written 5 weeks ago by ATpoint14k

I agree with ATpoint: this would make your example more readable. Also, it would be helpful if you posted the corresponding line of the vcf so that we can see if there was a problem in the converison.

ADD REPLYlink written 5 weeks ago by Fabio Marroni2.1k

Hi Fabio, I have adjusted the format. Thanks for your suggestion. Do you know the reason for my problem? Thank you very much.

Shawn

ADD REPLYlink written 5 weeks ago by shawn20

The reason is that you have too many missing genotypes (presumably all the zeros). How many missing data are there in the vcf? How much missing data is tolerated with your plink command? It can be a problem in conversion, or maybe the vcf had a lot of missing data.

ADD REPLYlink written 5 weeks ago by Fabio Marroni2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2308 users visited in the last hour