Question: Fastest way to switch out sites on one BCF for sites in another BCF?
0
gravatar for curious
4 weeks ago by
curious320
curious320 wrote:

I want to replace sites in BCF B with those that appear in BCF A

BCF A:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781512 chrX:2781512:A:G        A       G       .       PASS        GT:    0|0
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|1
chrX    2781518 chrX:2781518:A:G        A       G       .       PASS        GT:    0|1

BCF B:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|0

I want BCF C:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781512 chrX:2781512:A:G        A       G       .       PASS        GT:    0|0
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|0
chrX    2781518 chrX:2781518:A:G        A       G       .       PASS        GT:    0|1

Right now I am basically removing chrX:2781514:C:A from BCF A, then I think I have to concat BCF A and BCF B to get BCF C, then sorting BCF C, kind of like this:

bcftools view -e ID=@{remove_snps_list} {BCF A} -Ob > {BCF A_filtered}

bcftools concat {BCF A_filtered} {BCF C} -Ob > {BCF C}

bcftools sort {BCF C} -Ob > {BCF C_sorted}

This is going to take forever with the size of my files, is there a better way?

bcf bcftools vcf • 83 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by curious320
1

Pipe the bcftools commands to save on IO time.

bcftools view -e ID=@{remove_snps_list} {BCF A} -Ob | bcftools concat - {BCF C} -Ob | bcftools sort -Ob - > {BCF C_sorted}
ADD REPLYlink written 4 weeks ago by RamRS27k

Other than that though, the three step approach seems reasonable and should have the desired effect?

Also would the BCF be loaded completely into memory before the sort step, since this I think can only be done with a complete BCF rather than a stream of sites?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by curious320

Yeah the steps seem good - multiple self-contained steps are better than one quashed up vague operation/script.

I'm not sure if the entire BCF will be loaded into memory - it doesn't seem necessary for your case - one could stream one VCF, seek to locations on the other using the index and then replace entries, but I'm not sure how bcftools works.

ADD REPLYlink written 4 weeks ago by RamRS27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 949 users visited in the last hour