Question: Need help with BCFtools annotate; not all information is carried forward
1
gravatar for Hadi M
15 months ago by
Hadi M30
Hadi M30 wrote:

Hi everyone,

I have a custom tab delimited annotation file that I used to annotate a VCF file using bcftools annotate. It worked just fine but the only problem is that if there are multiple hits on a particular position, only the first information is carried into the VCF file. Here's an example:

This is an example of my annotation file:

#CHROM  FROM    TO  TRAIT
chr3    100001  100010  Disease A
chr3    100005  100005  Disease B
chr3    100005  100005  Disease C

And here's an example of my VCF file:

#CHROM  POS ID  REF ALT
chr3    100005  .   A   T

Annotating the VCF file produce:

#CHROM  POS ID  REF ALT INFO
chr3    100005  .   A   T   Disease A

As you can see, only the first information is carried forward. My ideal output is:

#CHROM  POS ID  REF ALT INFO
chr3    100005  .   A   T   Disease A
chr3    100005  .   A   T   Disease B
chr3    100005  .   A   T   Disease C

Or:

 #CHROM POS ID  REF ALT INFO
 chr3   100005  .   A   T   Disease A | Disease B | Disease C

Is there an option in bcftools annotate that would allow me to get such output? If there is an alternative tool, do recommend as well. Cheers.

tool genome • 684 views
ADD COMMENTlink modified 15 months ago by Asaf8.4k • written 15 months ago by Hadi M30
1
gravatar for RamRS
15 months ago by
RamRS30k
Baylor College of Medicine, Houston, TX
RamRS30k wrote:

It is not legal for a VCF file to have multiple entries for the same chr-pos-ref-alt combination.

Option 1:

Try the --merge-logic parameter in bcftools annotate. I've never tried it, but it looks like it might work when used in the manner --merge-logic TRAIT:unique

Option 2:

You should be able to use R (dplyr) to get a new annotation file from your existing annotation file. Group by CHROM, POS, ID, REF, ALT and aggregate TRAIT to paste(TRAIT, collapse = " | "). This, of course, has the downside that range annotations and point annotations cannot be aggregated together (the group-by will only group by identical values, not overlapping ranges), forcing you to convert all range annotations to point annotations before aggregating them.

ADD COMMENTlink modified 15 months ago • written 15 months ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1279 users visited in the last hour