Plink Error: --recode does not yet support multipass recoding of very large files
2
0
Entering edit mode
2.2 years ago
xuanzhui • 0

I want to convert bfile to ped format, the command used is :
plink --bfile base --recode tab --out paddy
and the plink version is 1.9,
Then I got error: --recode does not yet support multipass recoding of very large files.
The full log is:
PLINK v1.90b6.25 64-bit (5 Mar 2022)
Options in effect:
--bfile base
--out paddy
--recode tab

Hostname: MacBook-Pro.local
Working directory: paddy
Start time: Tue Mar 15 17:52:51 2022

Random number seed: 1647337971
16384 MB RAM detected; reserving 8192 MB for main workspace.
18128777 variants loaded from .bim file.
3024 people (0 males, 0 females, 3024 ambiguous) loaded from .fam.
Ambiguous sex IDs written to paddy.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 3024 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.826143.
18128777 variants and 3024 people pass filters and QC.
Note: No phenotypes present.
Error: --recode does not yet support multipass recoding of very large files;
contact the PLINK developers if you need this.
For now, you can try using a machine with more memory, and/or split the file
into smaller pieces and recode them separately.

End time: Tue Mar 15 17:53:16 2022

How can I split the file into smaller pieces and recode them separately?

plink • 1.5k views
ADD COMMENT
1
Entering edit mode
2.2 years ago

Why do you want to do this? The .ped file format is EXTREMELY bad for large datasets.

ADD COMMENT
0
Entering edit mode
2.2 years ago
xuanzhui • 0

seems plink2 works:

plink2 --bfile base --export ped --out paddy

ADD COMMENT
0
Entering edit mode

Yes, I added that functionality to plink2 three days ago. But I didn't bring it up in my previous answer, because the more important question is, why are you exporting a >100 GB .ped file? I have never heard of a program that (i) can do something useful with that much data in a reasonable amount of time, yet (ii) the programmer is unable to make the small extension needed to read a ~15x smaller .bed file, far more quickly, instead.

ADD REPLY
0
Entering edit mode

I am asked to calculate the percentage of (diff pair) / (total pair).
For example, there is a gene sequence
A C A A T T G G
then the first pair A and C are different, so the 'diff pair' number is 1, total pair is 4, percentage is 1 / 4 = 25%.
Sorry I am not familiar with bio terminology...
So my initial thought is to convert the binary file to readable format.
And is there a better way for that?

ADD REPLY
0
Entering edit mode
  1. The standard text format for large genomic datasets is VCF, not .ped. VCF has much better software support: see bcftools in particular. (plink 1.9 and 2.0 are also capable of importing and exporting VCF more efficiently than they can import/export .ped.)

  2. There are many ways to perform this calculation, but this sounds like a homework problem so you're probably best off doing the recommended reading and maybe getting help from a course teaching assistant or office hours.

ADD REPLY

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6