Question: Retrieve a subset of SNPs in Plink
1
gravatar for Sally
4.0 years ago by
Sally10
Malaysia
Sally10 wrote:

I have genotypes data for 2 population, both in binary format (.bed, .bim, .fam). 1st population, consist of 1 parent and 107 progenies. 2nd population only consist of 50 progenies only.

Since the 2nd population didnt have genotype data for parent, I would like to extract parent's SNPs data from the 1st population since there are closely related, and then merge it into 2nd population.

Plink provide function --exclude/--keep --merge/--bmerge. To retrieve parent data, I used : **plink --bfile file --keep parent.txt --make-bed --out parent** where in parent.txt consists of family ID and individual ID.

To merge parent data into 2nd population, I used : **plink --bfile file2 --bmerge parent.bed parent.bim parent.fam --make-bed --out merge**

However, I'm noticed, after the extracting part, the number of data in .bim file still same as before. Is I'm using the correct commands?

Original file for population 1: wc file.* 3277 6844 4613223 file.bed 170860 1025160 5146571 file.bim 108 648 2194 file.fam

Parent file after extracting: wc parent.* 0 1 170863 parent.bed 170860 1025160 5146571 parent.bim 1 6 19 parent.fam

Please help me. Thank you.

linux plink bioinformatics • 3.0k views
ADD COMMENTlink modified 4.0 years ago by alesssia510 • written 4.0 years ago by Sally10
1
gravatar for alesssia
4.0 years ago by
alesssia510
London, UK
alesssia510 wrote:

Not sure of having understood your issue, but the .bim file describes the extended variant information, one variant per line. You have not performed any filtering on this dimension, hence the number of lines should not change. What should change is the .fam file, that indeed contains only one person (1 line in parent.fam, 107 lines in file,fam). 

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by alesssia510

Hi Alessia,

Thank you for the comment. Maybe i should restructure my problem statement. What I'm trying to do is I want to extract all variants for parent only (and I only know its family ID and individual ID) from population 1 and later merge this parent data into population 2.

I need help in order to solved this problem. Kindly advise me what to do. Thanks!

 

 

 

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Sally10

then do:

plink --bfile file --keep parent.txt --make-bed --out parent

plink -- bfile parent --write-snplist --out parent_snps

plink --bfile file2 --keep parent_snps --make-bed --out file2_only_parental_snps

ADD REPLYlink written 4.0 years ago by Floris Brenk880

Hi Floris,

Thank you for the suggestion. I understand the first and second lines command..but a little bit confused for the third line. From second line, i should get the list of SNPs for the parent. Then for the third line, how to keep parent_snps in file2 (population 2)? In my understanding, --keep will retrieve data only for the specified id listed in parent_snps file.

Thanks!

 

 

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Sally10

Oh sorry need to be extract...

plink --bfile file2 --extract parent_snps.snplist --make-bed --out file2_only_parental_snps

ADD REPLYlink written 4.0 years ago by Floris Brenk880

Or maybe you can try this workflow:

plink --bfile file2 --bmerge file.bed file.bim file.fam --make-bed --out merge

Then make a id file include which contains the 50 progenies and the parent only (let's call it include)

plink --bfile merge --keep include --make-bed --out final
ADD REPLYlink written 4.0 years ago by Sam2.2k

Hi Sam, 

Thank you for the suggestion. Will give it a try and update it later. Thanks!

ADD REPLYlink written 4.0 years ago by Sally10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour