Homozygous Ref Calls And No Calls
3
0
Entering edit mode
11.8 years ago
win ▴ 970

Hi all, I was wondering if someone could help.

I am generating a vcf file based on the following command chain.

samtools sort eg2.bam eg2.sorted.bam

samtools mpileup -uf referencegenomefile eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

bcftools view eg2.raw.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > results.vcf

What is being observed is that the VCF file only contains "1/1" or "0/1" or "1/0" calls. I don't see a single "0/0" call.

In bcftools i have tried to eliminate a the "v" parameter but then i dont get any genotype calls.

What I am really after is: is there a way that i can generate another VCF file (or include in the same file) that also has "0/0" genotypes. Also, is is possible to include no call, i.e. a way to indicate if a site was not sequenced at all?

Thanks in advance. A

• 7.3k views
ADD COMMENT
0
Entering edit mode

Can you clean up the command lines you're using? It looks there are missing pipe commands and maybe the linebreaks aren't correct between multiple commands.

ADD REPLY
0
Entering edit mode

I think you might be confusing typical microarray data formats such as 0/1/2 encoding for major/minor allele genotype calling, and VCF, which only reports things that are variant from the reference sequence.

ADD REPLY
0
Entering edit mode
11.8 years ago

There is not a way to incorporate a "no call" in a VCF that I am aware of, and this is not something you usually see outside of genotypes from microarray experiments. The reason you don't see a "0/0" call is probably because you have only one sample in your VCF file. If you have two samples in your file, and at a particular site sample1 is heterozygous variant and sample2 is not, then you would have sample1 = 0/1 and sample2 = 0/0. If you were to include non-variant sites in your VCF, then you would be reporting the entire reference genome, plus your variant sites. This is not what VCF is for.

ADD COMMENT
1
Entering edit mode

This is partially true. As he said, you won't get a genotype call of 0/0 in a single sample analysis, since the site isn't varying with respect to the reference. However, you can (should) get a line in the VCF file for every base if you leave off the "-v" option, just without the genotype call. That is, a VCF file isn't required to have genotype information for every site. This is useful for carrying information through to later analyses, though usually people keep it in BCF.

Examples of VCF lines without and with genotype information:

chrmt   91      .       T       .       32.1    .       DP=1529;AF1=0;AC1=0;DP4=917,594,0,0;MQ=54;FQ=-32.1      PL:DP:SP        0:39:0

chrmt   92      .       C       A       999     .       DP=1537;VDB=0.0169;AF1=1;AC1=162;DP4=0,1,916,594;MQ=54;FQ=-32.1;PV4=0.39,1,1,0.4        GT:PL:DP:SP:GQ  1/1:255,117,0:39:0:99
ADD REPLY
0
Entering edit mode

Thanks, this might help but then if the file is bcf then how does one read it? Can it be piped into vcfutils and converted into a vcf?

In other words will it be possible to query a bcf file?

ADD REPLY
0
Entering edit mode

This is the plain and simple use of bcftools view, without genotype calling. bcftools view will output as vcf unless you tell it not to (-b), which you can pipe into tools that expect vcf (like vcfutils).

To complete the circle with your original question, you can pass bcftools view a VCF file with all sites (no genotypes called), telling it to call genotypes this time. Something like:

bcftools view -cg input.bcf > all_sites.vcf
bcftools view -Svcg all_sites.vcf > variant_sites.vcf
ADD REPLY
0
Entering edit mode
11.8 years ago
Vivek ★ 2.7k

You will have to call consensus to get the homozygous reference calls. In GATK the --emitallsites option will do that but the output VCF file will be pretty huge.

ADD COMMENT
0
Entering edit mode
7.7 years ago
Korsocius ▴ 250

Try to use this:

 for i in *.bam 
 do  
 samtools mpileup -f hg38.fa -g $i | bcftools call -m  > ${i%.bam}.vcf 
 done
ADD COMMENT

Login before adding your answer.

Traffic: 1625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6