Question: Problem with bcftools roh
0
gravatar for noushin.farnoud
8 months ago by
United States
noushin.farnoud100 wrote:

Dear All,

I have a problem with bcftools roh and hope you can guide me to what may be causing this:

I run Example-1 of the Github: http://samtools.github.io/bcftools/howtos/roh-calling.html and see the output consists of a state (0:HW, 1:AZ) , and quality for each position that was in the original vcf file (test.vcf.gz).

I ham a bam file and like to get the exact same-format output for my list of positions. Here is how I generate the original .vcf (just in case that the way I am generating this is causing the issue that I explain later).

samtools mpileup -uf $FASTA ./myfile.bam -l ./dbSNP_chr9.bed | bcftools call -m -O z > roh.test.vcf.gz

This will provide a .vcf for 309 positions that I queried, and the header/rest of the vcf looks fine:

    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ...
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callVersion=1.3.1+htslib-1.3.1
    ##bcftools_callCommand=call -m -O z
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mysample
    9       421981  .       G       .       0       .       DP=470;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AN=2;DP4=208,196,0,1;MQ=60 GT      0/0
    9       422026  .       T       .       0       .       DP=405;MQSB=1;MQ0F=0;AN=2;DP4=148,216,0,0;MQ=60 GT      0/0
    9       3206113 .       A       .       0       .       DP=338;MQSB=1;MQ0F=0;AN=2;DP4=219,79,0,0;MQ=60  GT      0/0
...

But when I try the command in the example, I only receive two lines (positions) in the output file as also noted in the stdout!

bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz 
# This file was produced by: bcftools roh(1.3.1+htslib-1.3.1)
# The command line was: bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz
#
# [1]Chromosome [2]Position [3]State (0:HW, 1:AZ)   [4]Quality
9   5009541 1   3.0
9   5125250 1   3.0
Number of lines: total/processed: 309/2

My .vcf has PL but I also tried skipping the PL scores with -G30 (as is in the example), just in case it had something to do with this but the answer was exactly the same.

I greatly appreciate if you could point me to what may be causing this.

Thank You,

nfarnoud

bcftools roh homozygosity • 468 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by noushin.farnoud100
0
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Is there an error? The program appeared to run successfully.

ADD COMMENTlink written 8 months ago by Kevin Blighe51k

Yes, it's puzzling. No error. But as noted in the stdout (last line of the block above) it says 2 (out of 309) lines are processed with no further explanation:

Number of lines: total/processed: 309/2

ADD REPLYlink modified 8 months ago • written 8 months ago by noushin.farnoud100

The genotypes that you have posted are each 0/0, i.e., homozygous reference. How are the remaining genotypes in your input file? Why are there only 309 variants? Are they scattered genome-wide? Note that, for ROH to do its work, the variants have to be a certain distance apart. For example, one cannot detect ROH with accuracy by processing 3 variants that span a region of 1 megabase.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

The distance between variants is something I did not take into consideration! The reason that there are 309 variants is that I wanted to query LOH only on a specific chromosome (chr 9) and also I am working with a targeted assay. So what I did was finding all dbSNP snps that fall within my assay's .bed file and queried them using the 1st command (samtools + bcftools), and then used the resultant .vcf file as the input to roh command. About genotypes: most of the SNPs look homozygous in chr9 if we look only at GT field, however the percentage contribution of REF/ALT alleles for homozygous-called SNPs is still quite different (DP4) and that is why I thought the roh command may model those per-allele-frequencies and generate a different call. I may have not used the right approach to answer my question.

ADD REPLYlink written 8 months ago by noushin.farnoud100
1

Yes, basically, you require a high density of genotypes in order to accurately make a ROH call. I have used exome-seq data in the past for ROH analysis. You may have to play around with the parameters.

Another option is to use PLINK's ROH implementation.

ADD REPLYlink written 8 months ago by Kevin Blighe51k
1

@kevin Thanks a lot. I came across that in searches, will give it try. Thanks again for your help and input!

ADD REPLYlink written 8 months ago by noushin.farnoud100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour