1001 Arabidopsis SNP
0
2
Entering edit mode
5.2 years ago
shawn ▴ 20

Hi everyone,

I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.

Thanks.

https://1001genomes.org/data/GMI-MPI/releases/v3.1/

SNP plink vcf • 2.6k views
ADD COMMENT
0
Entering edit mode

You need to figure out which dataset you need to work on. If it is VCF file, for example this file: https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz , then you can use plink directly without any conversion, plink can read vcf formats.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.

ADD REPLY
0
Entering edit mode

Hi Did you find which data from the 1001 genome is suitable for GWAS? I have the same problem. Please help

ADD REPLY

Login before adding your answer.

Traffic: 1464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6