Question: Problem with bcftools roh
gravatar for noushin.farnoud
23 months ago by
United States
noushin.farnoud110 wrote:

Dear All,

I have a problem with bcftools roh and hope you can guide me to what may be causing this:

I run Example-1 of the Github: and see the output consists of a state (0:HW, 1:AZ) , and quality for each position that was in the original vcf file (test.vcf.gz).

I ham a bam file and like to get the exact same-format output for my list of positions. Here is how I generate the original .vcf (just in case that the way I am generating this is causing the issue that I explain later).

samtools mpileup -uf $FASTA ./myfile.bam -l ./dbSNP_chr9.bed | bcftools call -m -O z > roh.test.vcf.gz

This will provide a .vcf for 309 positions that I queried, and the header/rest of the vcf looks fine:

    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callCommand=call -m -O z
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mysample
    9       421981  .       G       .       0       .       DP=470;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AN=2;DP4=208,196,0,1;MQ=60 GT      0/0
    9       422026  .       T       .       0       .       DP=405;MQSB=1;MQ0F=0;AN=2;DP4=148,216,0,0;MQ=60 GT      0/0
    9       3206113 .       A       .       0       .       DP=338;MQSB=1;MQ0F=0;AN=2;DP4=219,79,0,0;MQ=60  GT      0/0

But when I try the command in the example, I only receive two lines (positions) in the output file as also noted in the stdout!

bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz 
# This file was produced by: bcftools roh(1.3.1+htslib-1.3.1)
# The command line was: bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz
# [1]Chromosome [2]Position [3]State (0:HW, 1:AZ)   [4]Quality
9   5009541 1   3.0
9   5125250 1   3.0
Number of lines: total/processed: 309/2

My .vcf has PL but I also tried skipping the PL scores with -G30 (as is in the example), just in case it had something to do with this but the answer was exactly the same.

I greatly appreciate if you could point me to what may be causing this.

Thank You,


bcftools roh homozygosity • 1.0k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by noushin.farnoud110
gravatar for Kevin Blighe
23 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Is there an error? The program appeared to run successfully.

ADD COMMENTlink written 23 months ago by Kevin Blighe69k

Yes, it's puzzling. No error. But as noted in the stdout (last line of the block above) it says 2 (out of 309) lines are processed with no further explanation:

Number of lines: total/processed: 309/2

ADD REPLYlink modified 23 months ago • written 23 months ago by noushin.farnoud110

The genotypes that you have posted are each 0/0, i.e., homozygous reference. How are the remaining genotypes in your input file? Why are there only 309 variants? Are they scattered genome-wide? Note that, for ROH to do its work, the variants have to be a certain distance apart. For example, one cannot detect ROH with accuracy by processing 3 variants that span a region of 1 megabase.

ADD REPLYlink written 23 months ago by Kevin Blighe69k

The distance between variants is something I did not take into consideration! The reason that there are 309 variants is that I wanted to query LOH only on a specific chromosome (chr 9) and also I am working with a targeted assay. So what I did was finding all dbSNP snps that fall within my assay's .bed file and queried them using the 1st command (samtools + bcftools), and then used the resultant .vcf file as the input to roh command. About genotypes: most of the SNPs look homozygous in chr9 if we look only at GT field, however the percentage contribution of REF/ALT alleles for homozygous-called SNPs is still quite different (DP4) and that is why I thought the roh command may model those per-allele-frequencies and generate a different call. I may have not used the right approach to answer my question.

ADD REPLYlink written 23 months ago by noushin.farnoud110

Yes, basically, you require a high density of genotypes in order to accurately make a ROH call. I have used exome-seq data in the past for ROH analysis. You may have to play around with the parameters.

Another option is to use PLINK's ROH implementation.

ADD REPLYlink written 23 months ago by Kevin Blighe69k

@kevin Thanks a lot. I came across that in searches, will give it try. Thanks again for your help and input!

ADD REPLYlink written 23 months ago by noushin.farnoud110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2109 users visited in the last hour