Question: 1001 Arabidopsis SNP
2
gravatar for shawn
6 months ago by
shawn20
shawn20 wrote:

Hi everyone,

I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.

Thanks.

https://1001genomes.org/data/GMI-MPI/releases/v3.1/

snp plink vcf • 343 views
ADD COMMENTlink modified 6 months ago by zx87547.9k • written 6 months ago by shawn20

You need to figure out which dataset you need to work on. If it is VCF file, for example this file: https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz , then you can use plink directly without any conversion, plink can read vcf formats.

ADD REPLYlink modified 6 months ago • written 6 months ago by zx87547.9k

Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.

ADD REPLYlink written 6 months ago by shawn20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1333 users visited in the last hour