Question: Many 0 in ped file converted from vcf genomic data
0
gravatar for shawn
6 months ago by
shawn20
shawn20 wrote:

Hi everyone,

I am learning do the gwas analysis. When I convert the genomic data "1001genomes_snp-short-indel_only_ACGTN.vcf.gz" download from here to plink ped format.

plink --vcf 1001genomes_snp-short-indel_only_ACGTN.vcf.gz--make-bed --out 1001genomes_snp-short-indel_only_ACGTN.vcf.gz

I find there are many 0 in the ped file like this:

88 88 0 0 0 -9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 T T 0 0 G G C C T T 0 0 0 0 T T G G 0 0 T T T T A A A A T T 0 0 T T 0 0 G G C C A A T T A A C C C C C C A A T T C C T T G G G G T T 0 0 C C G G G G T T T T T T A A T T C C G G G G G G C C C C G G G G G G G G C C C C G G C C T T T T G G A A C C T T A A G G 0 0 G G A A T T A A 0 0 0 0 C C C C T T G G G G G G A A T T 0 0 0 0 A A G G T T T T G G 0 0 C C T T C C 0 0 A A C C C C G G G G A A G G C C C C G G C C G G C C C C C C G G G G G G G G A A 0 0 C C C C A A C C C C C C G G C C C C C C C C C C C C C C G G T T C C C C C C C C A A 0 0 A A T T 0 0 T T T T T T A A G G T T G G G G T T C C G G G G C C G G C C 0 0 C C C C T T T T T T A A T T T T G G A A G G C C C C G G 0 0 G G C C G G T T T T C C 0 0 G G A A 0 0 C C G G C C T T 0 0 T T C C A A G G 0 0 C C A A A A G G C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

And when I do the quality control

 plink --bfile 1001genomes_snp-short-indel_only_ACGTN --maf 0.05 --geno 0.02 --mind 0.02 --hwe 1e-6 --make-bed --out snp

it showed "Error: All people removed due to missing genotype data (--mind)". Does anyone know the reason? Do I choose the wrong dataset or I made some mistake? Thanks a lot.

snp plink gwas vcf • 303 views
ADD COMMENTlink modified 6 months ago by zx87548.0k • written 6 months ago by shawn20

Please use the formatting bar (10101) to highlight code and data examples.

ADD REPLYlink written 6 months ago by ATpoint21k

I agree with ATpoint: this would make your example more readable. Also, it would be helpful if you posted the corresponding line of the vcf so that we can see if there was a problem in the converison.

ADD REPLYlink written 6 months ago by Fabio Marroni2.3k

Hi Fabio, I have adjusted the format. Thanks for your suggestion. Do you know the reason for my problem? Thank you very much.

Shawn

ADD REPLYlink written 6 months ago by shawn20

The reason is that you have too many missing genotypes (presumably all the zeros). How many missing data are there in the vcf? How much missing data is tolerated with your plink command? It can be a problem in conversion, or maybe the vcf had a lot of missing data.

ADD REPLYlink written 6 months ago by Fabio Marroni2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 790 users visited in the last hour