Question

SNP Array data with 0 copies for a segment

1

Entering edit mode

7.1 years ago

mrz132435 ▴ 20

Hello,

I am working with SNP array data and would like to know what I should expect to see when a genome has lost both alleles at a segment.

The issue is, data for SNP arrays is reported as the Log2 R Ratio (LRR) and the B allele fraction (BAF). The LRR is supposed to be Log2(cn/2) where cn is the number of copies at a given SNP. So if there are 2 copies (i.e., for a normal segment of the genome) the LRR should be around 0; if there is 1copy (loss of copy) it should be around -1. But what about when there are 0 copies? Theoretically the LRR should be -Inf. But what does data actually look like at that point? Does one just see really low (negative) numbers? And how low? -10? -15? What does the assay actually tell you when you have 0 copies?

Thanks.

Here's a link to some Illumina documentation including description of LRR and BAF: https://www.illumina.com/Documents/products/technotes/technote_cytoanalysis.pdf

SNP copy number CGH • 1.5k views

ADD COMMENT • link updated 7.1 years ago by bernatgel ★ 3.4k • written 7.1 years ago by mrz132435 ▴ 20

0

Entering edit mode

Somatic with normal contamination or germline?

ADD REPLY • link 7.1 years ago by markus.riester ▴ 550

0

Entering edit mode

The data is from tumor samples. It's leukemia so the cancer cells are (I've been told) fairly easy to isolate, so normal contamination should be negligible. So it is somatic copy number alterations I'm interested in; there are clearly plenty there too, just not sure what to expect when both copies of a segment have been lost.

ADD REPLY • link 7.1 years ago by mrz132435 ▴ 20

score 0 · Answer 1 · 2017-03-21

If there's no normal contamination you should see a deep drop in LRR and a kind of a smear in BAF.

BAF is computed as the part of the total signal that comes from the B allele. Since with an homozygous loss both alleles should have a signal 0, all signal you see will be background random noise. Therefore, the part of the signal (in reality only noise) coming from allele A or B will be random and so every marker in the deleted region will have a different random BAF value, creating a continuos smear between 0 and 1 in BAF. This is a clear pattern that's quite easy to identify when you see one.

However, if there's normal diploid contamination of the sample, the signal will come only from the (small) portion of diploid cells and therefore your BAF will be a standard BAF for a diploid region, a cluster at 0, a cluster at 0.5 and a cluster at 1.