Error in converting vcf file to plink format
1
0
Entering edit mode
8 weeks ago
Madhawa ▴ 10

Hi,

I am trying to convert vcf format to plink format. I use this dataset (ALL.2of4intersection.20100804.genotypes.vcf.gz). Following is the command I used in my cmd:

plink2 --vcf ALL.2of4intersection.20100804.genotypes.vcf.gz --make-bed --out ALL.2of4intersection.20100804.genotypes

However, I have an error as follows: Error: --vcf file decompression faliure: Malformed BGZF block

Then, I tried to unzip the gz file as follows to check whether I can unzip it: gzip -d ALL.2of4intersection.20100804.genotypes.vcf.gz

Again, there is an error as follows: gzip: ALL.2of4intersection.20100804.genotypes.vcf.gz: invalid compressed data--crc error gzip: ALL.2of4intersection.20100804.genotypes.vcf.gz: invalid compressed data--length error

It would be very helpful if you could give me your suggestion for this error messages. Thank you!

Plink cmd linux • 732 views
ADD COMMENT
0
Entering edit mode

Hello,

I am trying to use following command to downlaod the file with population information of 1000 genomes.

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/20100804.ALL.panel

But I have an error as follows:

--2022-09-29 11:06:10-- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/20100804.ALL.panel => ‘20100804.ALL.panel’ Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... failed: Temporary failure in name resolution. wget: unable to resolve host address ‘ftp.1000genomes.ebi.ac.uk’

Could you pls help me to check whether I used corret code to download this file? Thank you!

ADD REPLY
0
Entering edit mode

Try

wget -O 20100804/20100804.ALL.panel http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/20100804.ALL.panel

If you continue getting the following error

Temporary failure in name resolution

then ot is a local problem on your end. DNS is not able to resolve the name correctly. Wait and see if the problem resolves.

ADD REPLY
0
Entering edit mode

Thank you for your information. I just tried with your command and got the error as Temporary failure in name resolution. Did you mean like "wait and see if the problem resolves", means, do I need to re -reun after waiting some time please? kindly let me know. Thank you!

ADD REPLY
1
Entering edit mode

Correct. This looks like a problem with your local network. I was able to access the link without problems.

ADD REPLY
0
Entering edit mode

Thank you so much! I just check my workinf directory with "ls" and I was able to find the file name "20100804.All.panel". Does this means it was downloaded successfully? Pls let me know how can I make sure it? Thank you!

ADD REPLY
1
Entering edit mode

As long as the file is not empty it should be good.

ADD REPLY
0
Entering edit mode

Thank you so much! Will try to run the analysis with the file and check it! Thanks again!!

ADD REPLY
1
Entering edit mode
8 weeks ago

there was a problem when you downloaded ALL.2of4intersection.20100804.genotypes.vcf.gz . Try to download it again.

 wget -O ALL.2of4intersection.20100804.genotypes.vcf.gz "https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz" 
ADD COMMENT
0
Entering edit mode

Thank you so much for your kind information. I actually did it three times taking several hours but still there was that error. I will use the command you gave me to redownloaad the file. Thank you!

ADD REPLY
0
Entering edit mode

Thank you so much for your information again and I was able to download the file successfully! However, when I try to convert that vcf file to plink format, it only gave me FAM file.

I used this command to convert: plink2 --vcf ALL.2of4intersection.20100804.genotypes.vcf.gz --make-bed --out ALL.2of4intersection.20100804.genotypes

I have this error related to BMI file as follows:

--vcf: 25488488 variants scanned. --vcf: ALL.2of4intersection.20100804.genotypes-temporary.pgen + ALL.2of4intersection.20100804.genotypes-temporary.pvar.zst + ALL.2of4intersection.20100804.genotypes-temporary.psam written. 629 samples (0 females, 0 males, 629 ambiguous; 629 founders) loaded from ALL.2of4intersection.20100804.genotypes-temporary.psam. 25488488 variants loaded from ALL.2of4intersection.20100804.genotypes-temporary.pvar.zst. Note: No phenotype data present. Writing ALL.2of4intersection.20100804.genotypes.fam ... done. Writing ALL.2of4intersection.20100804.genotypes.bim ... Error: ALL.2of4intersection.20100804.genotypes.bim cannot contain multiallelic variants.

Could you kindly let me know any possibility to convert this vcf to BED, BIM and FAM files? Thank you once again!

ADD REPLY
1
Entering edit mode

remove multi allelic with bcftools view -m2 -M2 or normalise with bcftools norm

ADD REPLY
0
Entering edit mode

Thank you for the very quick answer and I will try it as you mentioned! Thank you!

ADD REPLY
0
Entering edit mode

I tried to search about bcftools somehow I could not figure out it since it is bit not familiar to me. However, while searching, I was able to find that "--max-alleles 2" could be used to filter out the multiallelic variants and I just tried it and it pretty worked for me and I was able to get BED, BIM and FAM files well! I hope this way is okay and I really appreciate your kind help once again! Thank you!

ADD REPLY
0
Entering edit mode

Hi,

I have another issue when I work with this data set. After I obtain the BED, BIM and FAM files, I tried to do some QC steps for this dataset in my computer. However, there is a memory error as follows:


  • FATAL ERROR Exhausted system memory *
  • *
  • You need a smaller dataset or a bigger computer...*
  • *
  • Forced exit now... *

I am wondering that is there any possibility to take a smaller dataset from this big data set for the practices? If it is, could you kindly let me know about it? Thank you!

ADD REPLY
1
Entering edit mode

This error is a consequence of using plink 1.07, which has to load the entire dataset into memory, instead of plink 1.9, which is capable of processing the data in a streaming manner.

ADD REPLY
0
Entering edit mode

Thank you so much for your reply with clarification. I will try with plink 1.9!

ADD REPLY
0
Entering edit mode

I just tried with plink 1.9 and it worked pretty well!!!! Thank you once again!

ADD REPLY

Login before adding your answer.

Traffic: 1263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6