Question: Homozygous Ref Calls And No Calls
0
gravatar for win
3.1 years ago by
win390
United States
win390 wrote:

Hi all, I was wondering if someone could help.

I am generating a vcf file based on the following command chain.

samtools sort eg2.bam eg2.sorted.bam

samtools mpileup -uf referencegenomefile eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

bcftools view eg2.raw.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > results.vcf

What is being observed is that the VCF file only contains "1/1" or "0/1" or "1/0" calls. I don't see a single "0/0" call.

In bcftools i have tried to eliminate a the "v" parameter but then i dont get any genotype calls.

What I am really after is: is there a way that i can generate another VCF file (or include in the same file) that also has "0/0" genotypes. Also, is is possible to include no call, i.e. a way to indicate if a site was not sequenced at all?

Thanks in advance. A

• 1.6k views
ADD COMMENTlink modified 3.1 years ago by Vivek1.1k • written 3.1 years ago by win390

Can you clean up the command lines you're using? It looks there are missing pipe commands and maybe the linebreaks aren't correct between multiple commands.

ADD REPLYlink written 3.1 years ago by matted5.1k

I think you might be confusing typical microarray data formats such as 0/1/2 encoding for major/minor allele genotype calling, and VCF, which only reports things that are variant from the reference sequence.

ADD REPLYlink written 3.1 years ago by Matt Shirley5.0k
0
gravatar for Matt Shirley
3.1 years ago by
Matt Shirley5.0k
Baltimore, MD
Matt Shirley5.0k wrote:

There is not a way to incorporate a "no call" in a VCF that I am aware of, and this is not something you usually see outside of genotypes from microarray experiments. The reason you don't see a "0/0" call is probably because you have only one sample in your VCF file. If you have two samples in your file, and at a particular site sample1 is heterozygous variant and sample2 is not, then you would have sample1 = 0/1 and sample2 = 0/0. If you were to include non-variant sites in your VCF, then you would be reporting the entire reference genome, plus your variant sites. This is not what VCF is for.

ADD COMMENTlink written 3.1 years ago by Matt Shirley5.0k
1

This is partially true. As he said, you won't get a genotype call of 0/0 in a single sample analysis, since the site isn't varying with respect to the reference. However, you can (should) get a line in the VCF file for every base if you leave off the "-v" option, just without the genotype call. That is, a VCF file isn't required to have genotype information for every site. This is useful for carrying information through to later analyses, though usually people keep it in BCF.

Examples of VCF lines without and with genotype information:

chrmt   91      .       T       .       32.1    .       DP=1529;AF1=0;AC1=0;DP4=917,594,0,0;MQ=54;FQ=-32.1      PL:DP:SP        0:39:0

chrmt   92      .       C       A       999     .       DP=1537;VDB=0.0169;AF1=1;AC1=162;DP4=0,1,916,594;MQ=54;FQ=-32.1;PV4=0.39,1,1,0.4        GT:PL:DP:SP:GQ  1/1:255,117,0:39:0:99
ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by matted5.1k

Thanks, this might help but then if the file is bcf then how does one read it? Can it be piped into vcfutils and converted into a vcf?

In other words will it be possible to query a bcf file?

ADD REPLYlink written 3.1 years ago by win390

This is the plain and simple use of bcftools view, without genotype calling. bcftools view will output as vcf unless you tell it not to (-b), which you can pipe into tools that expect vcf (like vcfutils).

To complete the circle with your original question, you can pass bcftools view a VCF file with all sites (no genotypes called), telling it to call genotypes this time. Something like:

bcftools view -cg input.bcf > all_sites.vcf
bcftools view -Svcg all_sites.vcf > variant_sites.vcf
ADD REPLYlink written 3.1 years ago by matted5.1k
0
gravatar for Vivek
3.1 years ago by
Vivek1.1k
Houston, TX
Vivek1.1k wrote:

You will have to call consensus to get the homozygous reference calls. In GATK the --emitallsites option will do that but the output VCF file will be pretty huge.

ADD COMMENTlink written 3.1 years ago by Vivek1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 341 users visited in the last hour