Hi everyone! I'm analyzing tumour samples with SNP array (Illumina and Affymetrix). Does anyone know if sometimes the softwares have "problems" to assign the B-allele frequency correctly? I know that sometimes they recenter the logR, but it doesn't happen with B-allele frequency, right?
Correct, the 'logR', more commonly known as the Log R Ratio (LRR) is just the log (base 2) (log2) of the probe intensity in, e.g., tumour, divided by intensity in matched normal - it is a crude measure for copy number. When this log2 ratio = 0, there is no difference between tumour and normal.
The definition of B-allele frequency (BAF) is never clear; however, it can be generally regarded as the frequency of the allele under study, which may the minor allele in a population study.
There are different points at which the software will struggle to correctly compute the BAF. If your DNA sample is poor quality, then everything will be difficult to calculate! If we plot the genotype of every SNP for a single sample of good quality, we would see a figure like this:
Here, the arms represent (for A and B alleles):
- vertical arm: BB (homozygouse B)
- diagonal arm: AB (heterozygous)
- horizontal arm: AA (homozygous A)
This sample has mostly well-defined genotype calls, as judged by the well proportioned / orthogonal arms. The 'fuzzy bits' between the arms represent genotype calls that are on the borderline - these genotype calls will not be accurate, and neither, therefore, will the BAFs for these.
Conversely, look at a similar plot for this very poor quality DNA sample:
That data would have to be thrown into the trash can.
Things that can affect the calculation of the BAF:
- allelic cross talk: when a probe for the A allele binds to the B allele sequence, and vice-versa
- allelic imbalance: this occurs, when, e.g., homozygous A (AA) signal strengths are lower or higher than homozygous B (BB), and I assume is down to differences in binding affinities between, e.g., GC and AT genotypes
Both of these sources of bias are usually corrected in any processing pipeline.
Take a read of my other answer: A: Genotyping, genotype calling or SNP calling?