Question: VCF filtering based on AB (Allele-Balance) tag
0
gravatar for AISHA
2.0 years ago by
AISHA90
Beijing
AISHA90 wrote:

Hi all,

I want to filter a multi sample vcf based on Allele Balance (AB) value of heterozygous calls only. The purpose is to keep those variants which have AB value between 0.25 and 0.75 and are heterozygous. In addition, homozygous calls will also be kept in vcf, and no filtering needs to be done for those variants based on any tag.

I don't know how can I filter VCF based on above-mentioned criteria.

For the sake of brevity, I just posted an example line from my input VCF.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1 Sample2
> chr1  87083 . G  T  69.7403 . AB=0.235294;ABP=13.3567;AC=1; GT:DP:RO:QR:AO:QA:GL 0/0:16:16:582:0:0:0,-4.81648,-52.7438 0/0:26:26:953:0:0:0,-7.82678,-86.1365

I'll be thankful for any help in this regard.

Aisha

allele balance vcf • 2.6k views
ADD COMMENTlink modified 2.0 years ago by RamRS28k • written 2.0 years ago by AISHA90

Try GATK variant filtration walker and example to filter by AB is provided in manual page (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php) You can use snpsift or bcftools for filtering format field. What is the criteria for homozygous and heterozygous calls in your vcf?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by cpad011213k

Yeah. I've tried bcf filtering for Allele-Balance (AB). But I'm unable to get the desired output as I mentioned above. Calls having 0/0 or 1/1 are homozygous while those having GT 0/1 or 1/0 are heterozygous.

ADD REPLYlink written 2.0 years ago by AISHA90

duplicate: Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Pierre Lindenbaum129k

Hi @Pierre!

In that particular post, you mentioned, solution is provided based on AD not AB. That's why I had to open another question.

ADD REPLYlink written 2.0 years ago by AISHA90

I see. Your question is not clear to me. You have a AB value in the INFO column but I don't understand why you say: " homozygous calls will also be kept in vcf"

ADD REPLYlink written 2.0 years ago by Pierre Lindenbaum129k
1
gravatar for cpad0112
2.0 years ago by
cpad011213k
India
cpad011213k wrote:

try this @ AISHA and update if it works or not.

 bcftools view -g het -i  "AB>0.25 && AB < 0.75" input.vcf.gz

bgzip the vcf (with bgzip) and index vcf.gz (with tabix). Check also if AB is zero for homozygous calls. Try this if you want to hom all and het with AB cut off.

$ bcftools view -i  'GT="hom"| (GT="het" && (AB >0.25 && AB <0.75))' input.vcf.gz
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by cpad011213k

Hi @cpad0112

I tried commands you suggested but they didn't provide optimal solution.

In case of 1st command, output VCF contained all the heterozygous variants having AB between 0.25 and 0.75. But it didn't output homozygous variants. As you suggested, I also checked AB value for homozygous calls, and it was zero for GT (1/1).

2nd command produced output VCF same as INPUT VCF. No filtering was made.

Still, looking for possible solution.

Thanks!

ADD REPLYlink written 24 months ago by AISHA90
1

Count hom and het with cutoff separately and then see if their sum is equal to 2nd command. A round about way is that filter hom only variants and filter het with cutoffs. You will have two vcfs at this points. Now merge the two vcfs with bcftools. It would help if you could post an example VCF.

ADD REPLYlink modified 24 months ago • written 24 months ago by cpad011213k

Hi @cpad0112

Thank you so much for your help. I've found the command to do this task. :)

ADD REPLYlink modified 23 months ago • written 23 months ago by AISHA90

could you please post the command you found to do this task.

ADD REPLYlink written 22 months ago by balathumma100
1

Hi @balathumma10

Following command helped me to accomplish the said task.

bcftools view -i 'AB>0.25 && AB < 0.75 | AB < 0.01' in.vcf > out.vcf
ADD REPLYlink written 22 months ago by AISHA90

Hi @AISHA, just want to understand the rationale behind the above command in retaining the homozygous calls. Piping AB < 0.01 makes you retain the homozygous calls? What if I want to filter by AB in homozygous calls as well as retaining heterozygous ones?

ADD REPLYlink written 22 months ago by aritra9060
1

Hi @aritra90

1) Rationale behind retaining homozygous calls? To catch loci that are fixed variants (all individuals are homozygous for one of the two variants)

2) Piping AB < 0.01 makes you retain the homozygous calls? YES

3) The way you filter depends upon your aim. AB is always equal to zero in case of homozygous calls. For further details you can visit this: SNP Filtering

ADD REPLYlink modified 22 months ago • written 22 months ago by AISHA90

Thanks a lot @AISHA! Your description and that link is really helpful!

ADD REPLYlink written 22 months ago by aritra9060
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1563 users visited in the last hour