Question: Homozygous Ref Calls And No Calls
0
gravatar for ashkot09
21 months ago by
ashkot09250
ashkot09250 wrote:

Hi all, I was wondering if someone could help.

I am generating a vcf file based on the following command chain.

samtools sort eg2.bam eg2.sorted.bam

samtools mpileup -uf referencegenomefile eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

bcftools view eg2.raw.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > results.vcf

What is being observed is that the VCF file only contains "1/1" or "0/1" or "1/0" calls. I don't see a single "0/0" call.

In bcftools i have tried to eliminate a the "v" parameter but then i dont get any genotype calls.

What I am really after is: is there a way that i can generate another VCF file (or include in the same file) that also has "0/0" genotypes. Also, is is possible to include no call, i.e. a way to indicate if a site was not sequenced at all?

Thanks in advance. A

ADD COMMENTlink modified 21 months ago by Vivek440 • written 21 months ago by ashkot09250

Can you clean up the command lines you're using? It looks there are missing pipe commands and maybe the linebreaks aren't correct between multiple commands.

ADD REPLYlink written 21 months ago by matted4.0k

I think you might be confusing typical microarray data formats such as 0/1/2 encoding for major/minor allele genotype calling, and VCF, which only reports things that are variant from the reference sequence.

ADD REPLYlink written 21 months ago by Matt Shirley3.5k
0
gravatar for Matt Shirley
21 months ago by
Matt Shirley3.5k
Baltimore, MD
Matt Shirley3.5k wrote:

There is not a way to incorporate a "no call" in a VCF that I am aware of, and this is not something you usually see outside of genotypes from microarray experiments. The reason you don't see a "0/0" call is probably because you have only one sample in your VCF file. If you have two samples in your file, and at a particular site sample1 is heterozygous variant and sample2 is not, then you would have sample1 = 0/1 and sample2 = 0/0. If you were to include non-variant sites in your VCF, then you would be reporting the entire reference genome, plus your variant sites. This is not what VCF is for.

ADD COMMENTlink written 21 months ago by Matt Shirley3.5k
1

This is partially true. As he said, you won't get a genotype call of 0/0 in a single sample analysis, since the site isn't varying with respect to the reference. However, you can (should) get a line in the VCF file for every base if you leave off the "-v" option, just without the genotype call. That is, a VCF file isn't required to have genotype information for every site. This is useful for carrying information through to later analyses, though usually people keep it in BCF.

Examples of VCF lines without and with genotype information:

chrmt   91      .       T       .       32.1    .       DP=1529;AF1=0;AC1=0;DP4=917,594,0,0;MQ=54;FQ=-32.1      PL:DP:SP        0:39:0

chrmt   92      .       C       A       999     .       DP=1537;VDB=0.0169;AF1=1;AC1=162;DP4=0,1,916,594;MQ=54;FQ=-32.1;PV4=0.39,1,1,0.4        GT:PL:DP:SP:GQ  1/1:255,117,0:39:0:99
ADD REPLYlink modified 21 months ago • written 21 months ago by matted4.0k

Thanks, this might help but then if the file is bcf then how does one read it? Can it be piped into vcfutils and converted into a vcf?

In other words will it be possible to query a bcf file?

ADD REPLYlink written 21 months ago by ashkot09250

This is the plain and simple use of bcftools view, without genotype calling. bcftools view will output as vcf unless you tell it not to (-b), which you can pipe into tools that expect vcf (like vcfutils).

To complete the circle with your original question, you can pass bcftools view a VCF file with all sites (no genotypes called), telling it to call genotypes this time. Something like:

bcftools view -cg input.bcf > all_sites.vcf
bcftools view -Svcg all_sites.vcf > variant_sites.vcf
ADD REPLYlink written 21 months ago by matted4.0k
0
gravatar for Vivek
21 months ago by
Vivek440
Houston, TX
Vivek440 wrote:

You will have to call consensus to get the homozygous reference calls. In GATK the --emitallsites option will do that but the output VCF file will be pretty huge.

ADD COMMENTlink written 21 months ago by Vivek440
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 533 users visited in the last hour