Question: loadling bcf2 file in plink
gravatar for
11 months ago by
janhuang.cn90 wrote:

I am using bcftools to save a bcf2 file (chr22.bcf2) for a later analysis using plink.

bcftools view --samples-file sample.txt --force-samples --types snps --output-type b chr22_genotypes.vcf.gz | bcftools norm -Ob -m+any  | bcftools norm -Ob --fasta-ref human_g1k_v37.fasta | bcftools annotate --output-type b --output chr22.bcf2  -I '%CHROM:%POS:%ID'

But when I load this file into plink, invalid chromosome was found.

plink --bcf chr22.bcf2 --make-bed --out chr22

Error: Invalid chromosome code 'hs37d5' in .bcf file.

This error was not returned when I save the file as vcf and load it with plink. And when I checked which line has a chromosome code of "hs37d5", the below message was returned.

bcftools view  chr22.bcf2  | grep -e "hs37d5" | cut -f1-7

##reference=                 _reference_assembly_sequence/hs37d5.fa.gz

Are these the header lines? Can exclude them when loading with plink? or exclude them when saving as bcf2?

plink bcf • 507 views
ADD COMMENTlink modified 11 months ago by Kevin Blighe28k • written 11 months ago by janhuang.cn90
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe28k
USA / Europe / Brazil
Kevin Blighe28k wrote:

I was able to replicate your error and also prove that it works when it's a VCF (but not BCF).

Error: Invalid chromosome code 'hs37d5' in .bcf file.

(Use --allow-extra-chr to force it to be accepted.)

The contig parameter in the VCF header just lists all of the contigs/chromosomes that should be present in your file. You can safely remove it manually.

Afer removing it, if you convert your file back to vcf with bcftools view -Ov chr22.bcf2 > chr22.vcf and then index with bgzip chr22.vcf followed by tabix -p chr22.vcf.gz, VCFtools will automatically add new contig tags for whatever contigs are present. In my example, ##contig= < ID=22 > is added.

I then convert this back to BCF and index it:

bcftools view -Ob chr22.vcf.gz > chr22.bcf
bcftools index chr22.bcf

Once you do that, it reads into PLINK:

/Programs/plink1.90/plink --bcf chr22.bcf2 --make-bed --out chr22
PLINK v1.90b3.38 64-bit (7 Jun 2016)
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to chr22.log.
Options in effect:
  --bcf chr22.bcf2
  --out chr22

15037 MB RAM detected; reserving 7518 MB for main workspace.
--bcf: chr22-temporary.bed + chr22-temporary.bim + chr22-temporary.fam written.
12 variants loaded from .bim file.
2504 people (0 males, 0 females, 2504 ambiguous) loaded from .fam.
Ambiguous sex IDs written to chr22.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 2504 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999434.
12 variants and 2504 people pass filters and QC.
Note: No phenotypes present.
--make-bed to chr22.bed + chr22.bim + chr22.fam ... done.
ADD COMMENTlink modified 11 months ago • written 11 months ago by Kevin Blighe28k

I am not sure about the steps you mentioned.

My original data is chr22_genotypes.vcf.gz. Do you mean I can remove the contig parameter in chr22_genotypes.vcf.gz and save as chr22.vcf And then apply bgzip chr22.vcf | tabix -p chr22.vcf.gz to chr22.vcf?

And then apply this command to convert to bcf?

bcftools view -Ob chr22.vcf > chr22.bcf
bcftools index chr22.bcf

Also, do you mean remove the contig parameter using bcftools? Could you suggest the command I should be using?

ADD REPLYlink written 11 months ago by janhuang.cn90

Hey, no, I only convert the BCF (binary compressed) back to VCF (plain text) so that I can manually remove the line using the vi editor (or any other text editor). This is a perfectly reasonable thing to do, provided that you understand.

ADD REPLYlink written 11 months ago by Kevin Blighe28k

I see. Thank you very much.

ADD REPLYlink written 11 months ago by janhuang.cn90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour