Question: Problem with bcftools roh
gravatar for noushin.farnoud
8 months ago by
United States
noushin.farnoud100 wrote:

Dear All,

I have a problem with bcftools roh and hope you can guide me to what may be causing this:

I run Example-1 of the Github: and see the output consists of a state (0:HW, 1:AZ) , and quality for each position that was in the original vcf file (test.vcf.gz).

I ham a bam file and like to get the exact same-format output for my list of positions. Here is how I generate the original .vcf (just in case that the way I am generating this is causing the issue that I explain later).

samtools mpileup -uf $FASTA ./myfile.bam -l ./dbSNP_chr9.bed | bcftools call -m -O z > roh.test.vcf.gz

This will provide a .vcf for 309 positions that I queried, and the header/rest of the vcf looks fine:

    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callCommand=call -m -O z
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mysample
    9       421981  .       G       .       0       .       DP=470;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AN=2;DP4=208,196,0,1;MQ=60 GT      0/0
    9       422026  .       T       .       0       .       DP=405;MQSB=1;MQ0F=0;AN=2;DP4=148,216,0,0;MQ=60 GT      0/0
    9       3206113 .       A       .       0       .       DP=338;MQSB=1;MQ0F=0;AN=2;DP4=219,79,0,0;MQ=60  GT      0/0

But when I try the command in the example, I only receive two lines (positions) in the output file as also noted in the stdout!

bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz 
# This file was produced by: bcftools roh(1.3.1+htslib-1.3.1)
# The command line was: bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz
# [1]Chromosome [2]Position [3]State (0:HW, 1:AZ)   [4]Quality
9   5009541 1   3.0
9   5125250 1   3.0
Number of lines: total/processed: 309/2

My .vcf has PL but I also tried skipping the PL scores with -G30 (as is in the example), just in case it had something to do with this but the answer was exactly the same.

I greatly appreciate if you could point me to what may be causing this.

Thank You,


bcftools roh homozygosity • 468 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by noushin.farnoud100
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Is there an error? The program appeared to run successfully.

ADD COMMENTlink written 8 months ago by Kevin Blighe51k

Yes, it's puzzling. No error. But as noted in the stdout (last line of the block above) it says 2 (out of 309) lines are processed with no further explanation:

Number of lines: total/processed: 309/2

ADD REPLYlink modified 8 months ago • written 8 months ago by noushin.farnoud100

The genotypes that you have posted are each 0/0, i.e., homozygous reference. How are the remaining genotypes in your input file? Why are there only 309 variants? Are they scattered genome-wide? Note that, for ROH to do its work, the variants have to be a certain distance apart. For example, one cannot detect ROH with accuracy by processing 3 variants that span a region of 1 megabase.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

The distance between variants is something I did not take into consideration! The reason that there are 309 variants is that I wanted to query LOH only on a specific chromosome (chr 9) and also I am working with a targeted assay. So what I did was finding all dbSNP snps that fall within my assay's .bed file and queried them using the 1st command (samtools + bcftools), and then used the resultant .vcf file as the input to roh command. About genotypes: most of the SNPs look homozygous in chr9 if we look only at GT field, however the percentage contribution of REF/ALT alleles for homozygous-called SNPs is still quite different (DP4) and that is why I thought the roh command may model those per-allele-frequencies and generate a different call. I may have not used the right approach to answer my question.

ADD REPLYlink written 8 months ago by noushin.farnoud100

Yes, basically, you require a high density of genotypes in order to accurately make a ROH call. I have used exome-seq data in the past for ROH analysis. You may have to play around with the parameters.

Another option is to use PLINK's ROH implementation.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

@kevin Thanks a lot. I came across that in searches, will give it try. Thanks again for your help and input!

ADD REPLYlink written 8 months ago by noushin.farnoud100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour