Question

Comparing Exomes -How to get ALL variants

0

Entering edit mode

11.2 years ago

bernatgel ★ 3.4k

I have exome data from primary tumors and from derived xenografts and I need to compare them. For every variant in the primary tumor, I want to know if it's present in the xenograft and if its frequecy has changed. In addition, I also need to find out if any new variant is present in the xenograft. For this comparison, I can only take into account the regions that have good coverage in the primary tumor and in the xenograft.

So far I've mapped the data (prefiltering murine reads in the xenograft), filtered out the low coverage regions and called variants with samtools in both the primary tumor and the xenograft.

samtools mpileup -u -f genome.fa sample.bam | bcftools view -vcg - > sample.var.raw.vcf

ith this I get the two lists of variants and I can compare them. However, a certain variant can be called, for example, in the xenograft but not in the primary tumor despite being present in 1 or 2 reads.

Is there a way to call ALL the variants in an exome, even the false ones (I want ZERO false negatives even at the cost of thousands of false positives)? I suppose I can parse the mpileup output... but I'd rather prefer not having to do it.

Any idea on a better way to compare the exomes?

Thanks!

exome samtools variant-calling • 2.9k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by bernatgel ★ 3.4k

0

Entering edit mode

Why don't you just `samtools mpileup -u -f genome.fa xenograft.bam tumour.bam | bcftools view -vcg - > combined.var.raw.vcf`? I suspect that'd solve most of your problems.

ADD REPLY • link 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

I'm trying it right now, but could you explain me how this should solve my problems? How could I use it to identify the variants present in the tumor and not in the xenograft and the variant in the xenograft with not even a single read supporting them in the tumor?

ADD REPLY • link 11.2 years ago by bernatgel ★ 3.4k

0

Entering edit mode

It won't do "not even a single read", but it'll utilize both samples in determining where there are variants, so a variant present in the tumour sample will increase the likelihood of a variant being called in the same position in the xenograft sample even with little evidence. Anyway, I know that you say that you want 0 false negatives, but the results from that wouldn't be very useful, thus my comment. If you really really do want 0 false negatives, then you will indeed need to just parse the mpileup output.

ADD REPLY • link 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your help. As you said, I still find variants with some support in the xenograft not being called. I'll try to play a bit more with the filtering and if necessary I'll parse the mpileup output.

ADD REPLY • link 11.2 years ago by bernatgel ★ 3.4k

score 0 · Answer 1 · 2014-04-15

0

Entering edit mode

11.2 years ago

Chris Miller 22k

What you probably want to do is call variants on both samples using standard parameters, merge and deduplicate the list of called sites, then get readcounts for each site from both samples. [bam-readcount](https://github.com/genome/bam-readcount) is one tool that makes extracting this information easy. With that information, you can filter the results to your desired level of confidence, including the prior information that a variant was called in the other sample.

ADD COMMENT • link 11.2 years ago by Chris Miller 22k

0

Entering edit mode

Thanks! I had already found bam-readcount in another thread and it's what I'll be using at the end.

ADD REPLY • link 11.2 years ago by bernatgel ★ 3.4k