B allele frequency SNP array
1
1
Entering edit mode
3.9 years ago
beaferbl ▴ 10

Hi everyone! I'm analyzing tumour samples with SNP array (Illumina and Affymetrix). Does anyone know if sometimes the softwares have "problems" to assign the B-allele frequency correctly? I know that sometimes they recenter the logR, but it doesn't happen with B-allele frequency, right?

snp • 1.9k views
2
Entering edit mode
3.9 years ago

Correct, the 'logR', more commonly known as the Log R Ratio (LRR) is just the log (base 2) (log2) of the probe intensity in, e.g., tumour, divided by intensity in matched normal - it is a crude measure for copy number. When this log2 ratio = 0, there is no difference between tumour and normal.

The definition of B-allele frequency (BAF) is never clear; however, it can be generally regarded as the frequency of the allele under study, which may the minor allele in a population study.

There are different points at which the software will struggle to correctly compute the BAF. If your DNA sample is poor quality, then everything will be difficult to calculate! If we plot the genotype of every SNP for a single sample of good quality, we would see a figure like this:

Here, the arms represent (for A and B alleles):

• vertical arm: BB (homozygouse B)
• diagonal arm: AB (heterozygous)
• horizontal arm: AA (homozygous A)

This sample has mostly well-defined genotype calls, as judged by the well proportioned / orthogonal arms. The 'fuzzy bits' between the arms represent genotype calls that are on the borderline - these genotype calls will not be accurate, and neither, therefore, will the BAFs for these.

## -----------------------------

Conversely, look at a similar plot for this very poor quality DNA sample:

That data would have to be thrown into the trash can.

## -----------------------------------

Things that can affect the calculation of the BAF:

• allelic cross talk: when a probe for the A allele binds to the B allele sequence, and vice-versa
• allelic imbalance: this occurs, when, e.g., homozygous A (AA) signal strengths are lower or higher than homozygous B (BB), and I assume is down to differences in binding affinities between, e.g., GC and AT genotypes

Both of these sources of bias are usually corrected in any processing pipeline.

Take a read of my other answer: A: Genotyping, genotype calling or SNP calling?

Kevin

0
Entering edit mode

0
Entering edit mode

HI Kevin,

I used: http://penncnv.openbioinformatics.org/en/latest/user-guide/test/ to generate BAF and LRR after doing the whole workflow ending with this command:

./kcolumn.pl gw5.lrr_baf.txt split 2 -tab -head 3 -name -out gw5


So now I have values of BAF and LRR for each sample:

For one sample a file like this:

Name    Chr Position    PAUSE_g_9GXO476_BI_SNP_H06_39908.CEL.Log R Ratio      PAUSE_g_9GXO476_BI_SNP_H06_39908.CEL.B Allele Freq
SNP_A-1780520   20  47874178    0.0391  0.9755
SNP_A-1780618   4   104894961   -0.1296 0.9801
SNP_A-1780632   14  51975831    -0.2333 0.0168
SNP_A-1780654   1   21039991    0.1808  0.0000
...


How would I make a plot like you show above? Can you please share the code? Also what is on your x and y axis?

0
Entering edit mode

HI Kevin,

I used: http://penncnv.openbioinformatics.org/en/latest/user-guide/test/ to generate BAF and LRR after doing the whole workflow ending with this command:

./kcolumn.pl gw5.lrr_baf.txt split 2 -tab -head 3 -name -out gw5


So now I have values of BAF and LRR for each sample:

For one sample a file like this:

Name    Chr Position    PAUSE_g_9GXO476_BI_SNP_H06_39908.CEL.Log R Ratio      PAUSE_g_9GXO476_BI_SNP_H06_39908.CEL.B Allele Freq
SNP_A-1780520   20  47874178    0.0391  0.9755
SNP_A-1780618   4   104894961   -0.1296 0.9801
SNP_A-1780632   14  51975831    -0.2333 0.0168
SNP_A-1780654   1   21039991    0.1808  0.0000
...


How would I make a plot like you show above? Can you please share the code? Also what is on your x and y axis?