theta value and R value in SNP data
1
2
Entering edit mode
10.0 years ago
ftp ▴ 140

Hello

I have a question regarding SNP data. I have a set of SNP files belonging to treated and control conditions. These SNP data contain different attributes (theta-value, R-value, Log R Ratio). I'm having difficulty in understanding what each of these attributes means. If anyone could explain or knows a good tutorial to explain the meaning and differences between them I'd really appreciate it.

SNP • 9.1k views
ADD COMMENT
0
Entering edit mode

I'm afraid basic statistics is not bioinformatics. Where did you get the data? Only the file generator knows for sure what theta and R mean, but it sounds like minor allele frequency and some correlation. A log ratio sounds like an effect likelihood. Maybe you want those to be high? Plot them as a function of position, and post a picture please.

ADD REPLY
3
Entering edit mode
10.0 years ago
Irsan ★ 7.8k
It seems like you have data from illumina snp arrays where the results are exported from genomestudio. Theta value is the same as b allele frequency. it ranges from 0 to 1 and represents the fraction of bases that are genotyped as the b allele (variant allele). 0 means homozygous reference (AA), 0.5 means heterozygous (AB) and 1 means homozygous variant (BB). The R value represents the fluoresence intensity of that probe of that sample. The log R ratio was obtained by dividing the R value of your sample by a baseline (maybe your matched control) and represents the ploidy (copy number) of your sample at that genomic position. LRR of 0 means copy number neutral, positive values mean copy number gains, negative values mean copy number losses
ADD COMMENT
0
Entering edit mode

I struggled with finding the mathematical explanation behind this transformation, so hopefully this helps someone. It is explained in the second paragraph of the background of this paper:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572624/

The way I understood it is that for illumina infinium arrays the two channels:

Cy5 = Red = A allele = X signal Cy3 = Green = B allele = Y signal

The raw data is analyzed like this in GenomeStudio

  1. The raw X and Y signal from each allele is normalized to account for background signal, etc using a proprietary illumina algorithm

  2. These normalized X and Y signals for each sample can be plotted on a cartesian coordinate system to get this type of plot in GenomeStudio:

1

  1. More commonly, the "polar transformation" of these values are shown with R plotted against theta like this:

https://imgur.com/dbRjBBp

  • R is the intensities. R for a sample is the sum of normalized X and Y (R= X+ Y)
  • Theta is the B allele frequency and is calculated by: (2/pi)*arctan(normalized Y/ normalized X)

You can check this math by selecting "filter rows" in the "Full Data Table" and making Theta, R, X, and Y visible for every SNP in GenomeStudio then plugging in X and Y to the formulas in 3.

ADD REPLY

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6