Which 1000 genomes 30x files should I use for imputation?
1
1
Entering edit mode
15 months ago
Apprentice ▴ 160

I would like to do an imputation using data of 1000 genomes phase 3 30x (GRCh38) as a reference panel. 1000 genomes phase 3 30x data seems to be available in various versions. Which file should I use for the imputation? For example, I am thinking of using a file that can be obtained from the following URL: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/

1000genomes imputation SNP • 829 views
ADD COMMENT
0
Entering edit mode
15 months ago
4galaxy77 2.8k

Yes, the vcf files are what you want if you need to use them as an imputation reference. Those files plus the indexes are correct. You may need/want to change the format dependent on what software you are using (e.g. convert to bref format if beagle5), but the software should explain. The VCFs are fine for use with shapeit4.

ADD COMMENT
0
Entering edit mode

Thank you for your comment.

There are several types of genotype data for 1000 genomes phase 3. For example, I found the following two types of data. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/ http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/

Which of these should I use for imputation? Or are there other data more suitable for imputation? If you know, I would appreciate it if you could let me know.

ADD REPLY

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6