Question: 1001 Arabidopsis SNP
2
gravatar for shawn
4 weeks ago by
shawn20
shawn20 wrote:

Hi everyone,

I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.

Thanks.

https://1001genomes.org/data/GMI-MPI/releases/v3.1/

snp plink vcf • 144 views
ADD COMMENTlink modified 4 weeks ago by zx87546.7k • written 4 weeks ago by shawn20

You need to figure out which dataset you need to work on. If it is VCF file, for example this file: https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz , then you can use plink directly without any conversion, plink can read vcf formats.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by zx87546.7k

Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.

ADD REPLYlink written 4 weeks ago by shawn20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour