Question: Need help with BCFtools annotate; not all information is carried forward
1
gravatar for Hadi M
12 weeks ago by
Hadi M20
Hadi M20 wrote:

Hi everyone,

I have a custom tab delimited annotation file that I used to annotate a VCF file using bcftools annotate. It worked just fine but the only problem is that if there are multiple hits on a particular position, only the first information is carried into the VCF file. Here's an example:

This is an example of my annotation file:

#CHROM  FROM    TO  TRAIT
chr3    100001  100010  Disease A
chr3    100005  100005  Disease B
chr3    100005  100005  Disease C

And here's an example of my VCF file:

#CHROM  POS ID  REF ALT
chr3    100005  .   A   T

Annotating the VCF file produce:

#CHROM  POS ID  REF ALT INFO
chr3    100005  .   A   T   Disease A

As you can see, only the first information is carried forward. My ideal output is:

#CHROM  POS ID  REF ALT INFO
chr3    100005  .   A   T   Disease A
chr3    100005  .   A   T   Disease B
chr3    100005  .   A   T   Disease C

Or:

 #CHROM POS ID  REF ALT INFO
 chr3   100005  .   A   T   Disease A | Disease B | Disease C

Is there an option in bcftools annotate that would allow me to get such output? If there is an alternative tool, do recommend as well. Cheers.

tool genome • 129 views
ADD COMMENTlink modified 12 weeks ago by Asaf6.1k • written 12 weeks ago by Hadi M20
1
gravatar for RamRS
12 weeks ago by
RamRS24k
Houston, TX
RamRS24k wrote:

It is not legal for a VCF file to have multiple entries for the same chr-pos-ref-alt combination.

Option 1:

Try the --merge-logic parameter in bcftools annotate. I've never tried it, but it looks like it might work when used in the manner --merge-logic TRAIT:unique

Option 2:

You should be able to use R (dplyr) to get a new annotation file from your existing annotation file. Group by CHROM, POS, ID, REF, ALT and aggregate TRAIT to paste(TRAIT, collapse = " | "). This, of course, has the downside that range annotations and point annotations cannot be aggregated together (the group-by will only group by identical values, not overlapping ranges), forcing you to convert all range annotations to point annotations before aggregating them.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1572 users visited in the last hour