Question: How to handle the whole genome sequencing data using by plink?
0
gravatar for line1438
3.1 years ago by
line143820
line143820 wrote:

I had 421 whole genome sequencing data and I want to analyze these data,

so I convert these data from the format of vcf to the format of ped,

but when the stage of tped file convert to ped file, the plink argued "Exhausted system memory",

I thought the reason is the number of markers were no less than 500 million in the sequencing data,

but the RAM (memory) in the server is 128 GB already,

so if I can not expand the memory in the server, how do I handle the whole genome sequencing data using by plink or other trustworthy software?

sequencing plink vcftools wgs • 1.7k views
ADD COMMENTlink modified 3.1 years ago by stu11153860 • written 3.1 years ago by line143820

Try the --memory and/or --parallel flags

ADD REPLYlink written 3.1 years ago by Medhat8.4k

Sorry, I can't find the command you mentioned in Plink 1.07 or 1.9,

can you tell me the details about it, thanks a lot.

ADD REPLYlink written 3.1 years ago by line143820

https://www.cog-genomics.org/plink2/parallel

https://www.cog-genomics.org/plink2/other#memory

Good luck :)

ADD REPLYlink written 3.1 years ago by Medhat8.4k

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLYlink written 3.0 years ago by line143820
3
gravatar for stu111538
3.1 years ago by
stu11153860
stu11153860 wrote:

Do you want to have all 421 datasets in one ped file? I handled large data with plink recently. I merged 60 vcf files to a multi-sampel vcf. There were 900 million markers and conversion to plink format using Plink 1.9 worked with 100GB MEM und 4 cpus. However, with 70GB it did not work. And you have much more samples. I think the problem is, that plink loads all markers of all samples to MEM. You could try to convert every sample individually after each other and then merge the ped files using plink's --merge-list afterwards.

ADD COMMENTlink written 3.1 years ago by stu11153860
1

PLINK 1.9 does not keep all input data in memory simultaneously

ADD REPLYlink written 3.1 years ago by Medhat8.4k

Yes, I want to have all 421 vcf file in one ped file.

I have some questions :

  1. How do you merge 60 vcf files to a multi-samples vcf? method and software?

  2. What method do you convert the data to plink format?

Actually, I converted one vcf file to plink format, not all samples, but I am using Plink 1.07.

  1. So, the plink argued "memory is not enough" is caused by Plink 1.07?
ADD REPLYlink written 3.1 years ago by line143820

You could try Plink 1.9. Command is probably the same: ./plink --vcf your_vcf --make-bed --out ... And try the --memory option as it was suggested.

  1. You can merge vcf files using bcftools merge. Before, you have to bgzip and index (tabix) your vcf files. Consider, that your multi-sample vcf will be extremely large and maybe hard to handle.
ADD REPLYlink written 3.1 years ago by stu11153860

I try the option --memory in Plink 1.9, but I had a question...

the total memory in my server was 128 GB, but actually only 112 GB memory,

how do I set the memory size (in MB) for the option --memory in Plink 1.9?

I try two situations such as --memory 120000 and --memory 100000, both can be executed normally

but I still want to know what size is the best to set to me.

ADD REPLYlink written 3.0 years ago by line143820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1578 users visited in the last hour