Question: 1001 Arabidopsis SNP
gravatar for shawn
4 weeks ago by
shawn20 wrote:

Hi everyone,

I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.


snp plink vcf • 144 views
ADD COMMENTlink modified 4 weeks ago by zx87546.7k • written 4 weeks ago by shawn20

You need to figure out which dataset you need to work on. If it is VCF file, for example this file: , then you can use plink directly without any conversion, plink can read vcf formats.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by zx87546.7k

Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.

ADD REPLYlink written 4 weeks ago by shawn20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour