Question

Linkage disequilibrium analysis of EUR populations from 1000 Genomes phase3 data: Considerations of singletons vs family trios

0

Entering edit mode

9.6 years ago

Scott ▴ 110

I am interested in calculating LD and R^2 values for a region of interest using Haploview. I am interested in the data from phase 3 of the 1000 Genomes, so I have been loading the data in linkage format from .ped and .info files as downloaded from 1000 Genomes. My question is how the structure of the population (singletons versus family trios) influences LD analysis. When I load files with data from family trios, it treats them as singletons by default. Is it correct to conduct LD analysis with family trios, or should this be avoided?

1000genomes SNP haploview linkage-disequilibrium • 4.4k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Scott ▴ 110

2

Entering edit mode

9.5 years ago

Adam ★ 1.0k

The final Phase 3 release has had 31 known related individuals removed. Details of the removed individuals are provided in the release folder on the FTP site (/vol1/ftp/release/20130502/). The genotypes for these individuals can be found in the supporting subfolders.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.5 years ago by Adam ★ 1.0k

0

Entering edit mode

9.5 years ago

Scott ▴ 110

After checking this manually it looks like all of the VCF files from Phase 3 of the 1000 Genomes Project obtained using the "data slicer" tool contain only non-related individuals.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.5 years ago by Scott ▴ 110

Ram · Accepted Answer · 2014-09-17

3

Entering edit mode

9.6 years ago

chrchang523 10k

The usual approach is to exclude all "nonfounders"--everyone who has a recorded ancestor also in the dataset, or who has e.g. a shared parent with another person in the dataset (even if said parent wasn't genotyped). (This is plink's default when calculating LD/r^2.)

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by chrchang523 10k

0

Entering edit mode

Thanks a lot. That was my suspicion. Do you know if there is any easy way to exclude certain individuals from either the generation of the .ped file, or the haploview analysis once loaded? I know you can view the individuals used in Haploview, but I am afraid I will have to edit the .ped file manually. I have all of the ancestry information from the 1000 Genomes site, so that's not a problem. I am just wondering if I have to filter the individuals myself.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Scott ▴ 110

1

Entering edit mode

If you can get your data into plink .ped/.map format,

plink --file [...] --filter-founders --recode --out [...]

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by chrchang523 10k

0

Entering edit mode

Thanks again. I'll check out plink

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Scott ▴ 110