Question: Comparing Exomes -How to get ALL variants
gravatar for bernatgel
6.2 years ago by
Barcelona, Spain
bernatgel2.6k wrote:

I have exome data from primary tumors and from derived xenografts and I need to compare them. For every variant in the primary tumor, I want to know if it's present in the xenograft and if its frequecy has changed. In addition, I also need to find out if any new variant is present in the xenograft. For this comparison, I can only take into account the regions that have good coverage in the primary tumor and in the xenograft.

So far I've mapped the data (prefiltering murine reads in the xenograft), filtered out the low coverage regions and called variants with samtools in both the primary tumor and the xenograft.

samtools mpileup -u -f genome.fa  sample.bam | bcftools view -vcg - > sample.var.raw.vcf

ith this I get the two lists of variants and I can compare them. However, a certain variant can be called, for example, in the xenograft but not in the primary tumor despite being present in 1 or 2 reads.

Is there a way to call ALL the variants in an exome, even the false ones (I want ZERO false negatives even at the cost of thousands of false positives)? I suppose I can parse the mpileup output... but I'd rather prefer not having to do it.

Any idea on a better way to compare the exomes?


samtools variant-calling exome • 1.8k views
ADD COMMENTlink written 6.2 years ago by bernatgel2.6k

Why don't you just `samtools mpileup -u -f genome.fa xenograft.bam tumour.bam | bcftools view -vcg - > combined.var.raw.vcf`? I suspect that'd solve most of your problems.

ADD REPLYlink written 6.2 years ago by Devon Ryan95k

I'm trying it right now, but could you explain me how this should solve my problems? How could I use it to identify the variants present in the tumor and not in the xenograft and the variant in the xenograft with not even a single read supporting them in the tumor?

ADD REPLYlink written 6.2 years ago by bernatgel2.6k

It won't do "not even a single read", but it'll utilize both samples in determining where there are variants, so a variant present in the tumour sample will increase the likelihood of a variant being called in the same position in the xenograft sample even with little evidence. Anyway, I know that you say that you want 0 false negatives, but the results from that wouldn't be very useful, thus my comment. If you really really do want 0 false negatives, then you will indeed need to just parse the mpileup output.

ADD REPLYlink written 6.2 years ago by Devon Ryan95k

Thanks for your help. As you said, I still find variants with some support in the xenograft not being called. I'll try to play a bit more with the filtering and if necessary I'll parse the mpileup output.

ADD REPLYlink written 6.2 years ago by bernatgel2.6k
gravatar for Chris Miller
6.2 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

What you probably want to do is call variants on both samples using standard parameters, merge and deduplicate the list of called sites, then get readcounts for each site from both samples. [bam-readcount]( is one tool that makes extracting this information easy. With that information, you can filter the results to your desired level of confidence, including the prior information that a variant was called in the other sample.

ADD COMMENTlink written 6.2 years ago by Chris Miller21k

Thanks! I had already found bam-readcount in another thread and it's what I'll be using at the end.

ADD REPLYlink written 6.2 years ago by bernatgel2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour