Question: Variant calling - display nucleotides with 0 coverage
gravatar for yarmda
17 months ago by
yarmda0 wrote:

I am calling variants compared to a given genome. I'm finding that using samtools mpileup or bcftools mpileup, I only get reports after there is some nonzero depth of coverage.

Further, I'm finding that if there is a mismatch within the first 5 or so nucleotides, that they will be skipped altogether.

I am hoping that if I can set a minimum depth of coverage to be 0, I will see these skipped regions. Further, I'm hoping that I will see the variant called in those skipped regions, too.

As far as I can tell, samtools/bcftools mpileup only allows you to set a maximum depth of coverage, not a minimum. Is there another tool I could use for this or a parameter I can set for either of these?

bcf mpileup bcftools samtools vcf • 461 views
ADD COMMENTlink written 17 months ago by yarmda0

Can you clarify this a little? Variants cannot be called where there is 0 coverage, so I don't understand what you want to do. Also, I'm not sure what you mean by "if there is a mismatch within the first 5 or so nucleotides, that they will be skipped altogether". The first 5 nucleotides of what? And what will be skipped? Do you mean that variants only appearing in the first 5 bp of a read are ignored, or the first 5 bp of a contig?

ADD REPLYlink written 17 months ago by Brian Bushnell15k

Sorry, I'll clarify.

Variants can't be called when there is 0 coverage, but I do want to see those uncovered nucleotides, if possible. This is almost entirely because when one of the reads near the beginning of the genome has a mismatch in the first 5 or so nucleotides of the read - the variant calling won't begin until after the mismatch. Since, I'm interested in those mismatches, I'd like the variant caller to begin calling at the onset of the read rather than once it starts matching. Does that make more sense?

ADD REPLYlink written 17 months ago by yarmda0

Hmmm... it's generally unsafe to call variants near the ends of reads because the alignment there is poor; you can't distinguish between substitutions and indels. But, you can use BBMap's variant-caller like this: in=mapped.sam ref=ref.fa out=variants.vcf border=0 clearfilters

"border=0" will include variants all the way to the end of the reads, and "clearfilters" will display all variants, even low-quality ones with depth 1. You can also set the various filters (including minreads, which is the minimum number of reads indicating a variant) manually if you want.

ADD REPLYlink modified 17 months ago • written 17 months ago by Brian Bushnell15k

Thanks for showing me this, Brian.

I understand the risks of calling variants in these regions.

I downloaded your code and ran it as you listed. I'm only seeing variants here, not regions of the genome where my reads aligned perfectly. Is there another setting for this? I don't see such an option in the usage.

ADD REPLYlink written 17 months ago by yarmda0

No... the variants file only contains differences. But you can produce a coverage file with pileup: in=mapped.bam basecov=basecov.txt

That will produce one line per reference base, indicating its coverage. If there are no variants at a location in the VCF file, then all of the coverage would match the reference at that location. In other words, this coverage includes both matches and mismatches. Note that the positions from pileup are 0-based while VCF files are 1-based.

ADD REPLYlink written 17 months ago by Brian Bushnell15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour