Question: Calling SNPs with 1X depth (samtools/bcftools)
2
gravatar for Coryza
4.1 years ago by
Coryza30
France
Coryza30 wrote:

Hi,

I am trying to call SNPs with at least 1X depth (My estimation is that most of the SNPs I want to see have indeed 1X / 2X depth). I have mapped consensus sequences to the Tomato reference genome (generating in theory 1X/2X depth on specific target positions) with Bowtie2 and perform SNP/INDEL calling with samtools mpileup and bcftools. However, I can't retrieve SNPs with 1X depth although I do mention it on the command line. Can anyone help me?

Commands:

for directory in *; do for file in $directory/*sorted*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); samtools mpileup -uf ../../Data/SolanumLycopersicum_gDNA_2.40.fa $file | bcftools view -bvcg - > $directory/$prefix.bcf & done; done
for directory in *; do for file in $directory/*bcf*; do prefix=$(echo $file | rev | cut -f1 -d '/' | rev | cut -f1 -d '.'); bcftools view $file | vcfutils.pl varFilter -d1 -D100 > $directory/$prefix.vcf & done; done

 

bcftools samtools • 2.0k views
ADD COMMENTlink modified 4.1 years ago by swbarnes25.0k • written 4.1 years ago by Coryza30

The variant calling is done by samtools mpileup, so you'll need to make changes there. I don't actually know that it's possible to call variants from 1x coverage with samtools (most NGS data isn't reliable enough to call variants from 1x coverage).

ADD REPLYlink written 4.1 years ago by Devon Ryan88k

I seem to recall someone posting a tool on seqanswers a while back (years?) that was designed for variant calling from very low coverage (maybe not 1X). If I can find that I'll post the link, but maybe someone else knows what I am talking about and can comment.

ADD REPLYlink written 4.1 years ago by SES8.1k

Someone on SEQanswers today posted this link that includes a program that might do what you want. You should completely ignore his discussion about samtools and varscan, the author doesn't actually have a clue what he's doing. However, I suspect that the C++ program he wrote will do what you need and function reasonably to do what you want.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Devon Ryan88k
1

I think I actually found a much more easier way. Bcftools has a -p option, which shows all filtered variation (including those with 1 depth). I assume that this solves my issue? ;)

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Coryza30

Most likely, good catch
 

ADD REPLYlink written 4.1 years ago by Devon Ryan88k
2
gravatar for swbarnes2
4.1 years ago by
swbarnes25.0k
United States
swbarnes25.0k wrote:

NGS data is too error prone to call SNPs based on a single read.  Most of what you would call would be Illumina errors, not true sequence differences.

ADD COMMENTlink written 4.1 years ago by swbarnes25.0k
1

I agree. And be aware that you won't be able to public these data in any reliable journal.

ADD REPLYlink written 4.1 years ago by marina.v.yurieva480

Thanks, we are aware of those issues ^^ But I think that consensus sequences of ~2KB, build up from ~>100 raw reads do not represent many sequencing errors anymore ;)

ADD REPLYlink written 4.1 years ago by Coryza30

Correct, but that's not what most tools are designed for. Presumably there's something for error corrected PacBio reads, which these would essentially represent then.

ADD REPLYlink written 4.1 years ago by Devon Ryan88k

You still think that a consensus of lets say 100 or more raw PacBio subreads contains sequencing errors?

ADD REPLYlink written 4.1 years ago by Coryza380

Not to any significant degree. There's likely a switch to mpileup to just have it emit even the lowest confidence calls, which wouldn't really be low confidence in this case. As I wrote, only tools designed with PacBio reads in mind are meant to handle datasets like this.

ADD REPLYlink written 4.1 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 762 users visited in the last hour