Question: bcftools merge, error "Could not parse the region(s)"
0
gravatar for agathejouet
3.3 years ago by
agathejouet0 wrote:

Hi all,

I am trying to merge multiple vcf files using bcftools version 1.6. Unfortunately, I receive the following error for all of my "chromosomes":

 [E::_regions_init_string] Could not parse the region(s): chr1

ex:

[E::_regions_init_string] Could not parse the region(s): JA218_chr6:29940123..29944556_UTR-0_lenght=4434
[E::_regions_init_string] Could not parse the region(s): JA455_chr11:20047638..20052236_UTR-0_lenght=4599
[E::_regions_init_string] Could not parse the region(s): JA327_chr9:10804606..10807219_UTR-0_lenght=2614
[E::_regions_init_string] Could not parse the region(s): JA205_chr6:24945965..24949792_UTR-0_lenght=3828

I have using the following command:

bcftools merge file1.vcf.gz file2.vcf.gz file3.vcf.gz -o outfile -O v -0

My vcf files have been compressed with bgzip and indexed with tabix (also version 1.6) using:

bgzip file1.vcf; tabix -p vcf file1.vcf.gz

Not sure what is happening here. Any help would be appreciated and please, let me know if you need any additional piece of information.

Thanks very much,

Agathe

tabix bcftools merge vcf • 2.5k views
ADD COMMENTlink modified 3.3 years ago by Pierre Lindenbaum133k • written 3.3 years ago by agathejouet0

what is the output of

tabix --list-chroms file1.vcf.gz | head -n 50

please

ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum133k
bgzip file1.vcf; tabix -p vcf file1.vcf.gz

you'd better always force bgzip/taxix and use a logical AND

bgzip -f file1.vcf &&  tabix -f -p vcf file1.vcf.gz
ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum133k

The output is:

LOC_Os01g16370_chr1:9292171..9298764_UTR-0
LOC_Os01g16400_chr1:9313135..9317036_UTR-0
LOC_Os01g21240_chr1:11854526..11857948_UTR-0
LOC_Os01g25630_chr1:14525576..14530578_UTR-0
LOC_Os01g25710_chr1:14570811..14580124_UTR-0
LOC_Os01g25810_chr1:14611521..14616096_UTR-0
LOC_Os01g33684_chr1:18530856..18539774_UTR-0
LOC_Os01g35254_chr1:19515711..19521445_UTR-0
LOC_Os01g36640_chr1:20326945..20333788_UTR-0
LOC_Os01g41890_chr1:23745796..23750669_UTR-0
LOC_Os01g42330_chr1:24021015..24026491_UTR-0
LOC_Os01g52270_chr1:30047246..30048941_UTR-0
LOC_Os01g52280_chr1:30054522..30055994_UTR-0
LOC_Os01g52304_chr1:30061671..30065301_UTR-0
LOC_Os01g52320_chr1:30067560..30069086_UTR-0
LOC_Os01g57280_chr1:33097028..33103506_UTR-0
LOC_Os01g57870_chr1:33459210..33462869_UTR-0
LOC_Os01g58520_chr1:33815574..33820089_UTR-0
LOC_Os01g59340_chr1:34294608..34306006_UTR-0
LOC_Os02g04530_chr2:2011707..2018122_UTR-0
LOC_Os02g06030_chr2:3005600..3007976_UTR-0
LOC_Os02g06180_chr2:3084913..3087072_UTR-0
LOC_Os02g10900_chr2:5785295..5788769_UTR-0
LOC_Os02g16060_chr2:9136746..9140131_UTR-0
LOC_Os02g16250_chr2:9238437..9239234_UTR-0
LOC_Os02g16270_chr2:9257996..9262612_UTR-0
LOC_Os02g16330_chr2:9284786..9290815_UTR-0
LOC_Os02g17304_chr2:9922463..9928347_UTR-0
LOC_Os02g18000_chr2:10444558..10450974_UTR-0
LOC_Os02g18070_chr2:10490444..10496411_UTR-0
LOC_Os02g18140_chr2:10535633..10539697_UTR-0
LOC_Os02g18510_chr2:10776282..10780756_UTR-0
LOC_Os02g19750_chr2:11550971..11554437_UTR-0
LOC_Os02g19890_chr2:11701933..11708391_UTR-0
LOC_Os02g20420_chr2:12039379..12052089_UTR-0
LOC_Os02g26500_chr2:15556364..15560053_UTR-0
LOC_Os02g27500_chr2:16264148..16266679_UTR-0
LOC_Os02g27540_chr2:16306055..16309173_UTR-0
LOC_Os02g27680_chr2:16398585..16399403_UTR-0
LOC_Os02g41760_chr2:25097465..25099747_UTR-0
LOC_Os03g26260_chr3:15014453..15021952_UTR-0
LOC_Os03g37720_chr3:20912120..20915920_UTR-0
LOC_Os03g38250_chr3:21232110..21236446_UTR-0
LOC_Os03g48370_chr3:27529787..27536134_UTR-0
LOC_Os03g63150_chr3:35686155..35691879_UTR-0
LOC_Os04g02030_chr4:638803..642201_UTR-0

I have 759 "chromosomes" like this, in a similar format. Will also keep in mind to force overwrite and use the && (which I actually do in my rake file).

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by agathejouet0

but is there the only whole word 'chr1' as you said in your first warning message ?

ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum133k
1
gravatar for Pierre Lindenbaum
3.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

from the VCF specification : https://samtools.github.io/hts-specs/VCFv4.3.pdf

CHROM - chromosome:(...) . The colon symbol (:) must be absent from all chromosome names to avoid parsing errors when dealing with breakends.

you vcf is not valid.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Pierre Lindenbaum133k

Thanks very much for your help, will try to modify this!

Agathe

ADD REPLYlink written 3.3 years ago by agathejouet0

I would go for something like:

awk -F '\t' '/^#/ {print;next;} {OFS="\t";gsub(/[\:\.\-]/,"_",$1);print;}' input.vcf
ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum133k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour