Can germline vs somatic variants be distinguished by phasing when no control is available?
1
0
Entering edit mode
8.4 years ago
amjad ▴ 100

I have cancer samples that lack germline control and I am interested to identify somatic variants as accurately as possible. I did filtering based on panel of normals and filtered out reported variants in public databases. However, I still have some suspicious cases that I want to resolve and I wonder if phasing can be used here.

Here is an IGV snapshot of an example where three nonsynonymous mutations are detected: https://www.dropbox.com/s/235b3nl0ukbera0/igv_panel.png?dl=0

Can we know confidently based on phasing which of these three are somatic? If yes, is there a systematic way to do such analysis?

phasing germline variants sequencing somatic • 3.4k views
ADD COMMENT
0
Entering edit mode

What do you think Phasing means? I thought it meant determining the strand of heterozygous alleles, in which case your answer is "no, it doesn't have anything to do with somatic vs germline".

ADD REPLY
3
Entering edit mode
8.4 years ago
donfreed ★ 1.6k

Yes, read-backed phasing can be used to identify subclonal (somatic) mutations from bulk tissue sequence data as long as the subclonal mutations are sufficiently close to clonal heterozygous (germline) mutations.

In your example, the middle mutation (orange) and the right mutation (green) are perfectly in phase, indicating that they are present in the same clonal population (probably germline heterozygous). However, the mutation on the left (blue) does not phase perfectly with the orange mutation, indicating the presence of at least two distinct clonal populations in the bulk tissue. The blue mutation probably arose somatically.

We wrote some code to automate the identification of these variants using simple heuristics: https://bitbucket.org/donald_freed/phase-mosaic

Caveat:

Using phasing, it is impossible to distinguish somatic mutations from germline or mosaic copy-number alterations.

ADD COMMENT
0
Entering edit mode

The blue shows up on the same read as the orange five times. The orange shows up with ref-allele at the blue's site twice, and the blue shows up with ref-allele at the orange's site never. There isn't enough information to call it anything. Maybe if the reads were paired we would have more haplotype information.

ADD REPLY
0
Entering edit mode

With a sequence error rate of 1%, we would expect to observe two of the seven reads supporting mosaicism once every ~500 sites, so I would personally feel ok calling the site mosaic. For his work Amjad will have to decide on his own false-positive rate.

Edit: combinatorics...

ADD REPLY
0
Entering edit mode

Thanks for the answer. Actually we are confident about the somatic origin of the blue one because we do have a related cancer sample that doesn't show that mutation. The confusion is about the other two and it's good to know that there is no way to confirm that.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6