What do the fields DATA.9 - DATA.12 represent in an ab1 file ?
1
1
Entering edit mode
7.0 years ago
anuragm ▴ 130

I am trying to better understand the information content of a .ab1 file. I wanted to know what the four fields DATA.9 to DATA.12 mean. The R vignette for SangerseqR package explains them as 'Vectors containing signal intensities for each channel' whereas the Applied Biosystems document for .ab1 files calls them "Short Array holding analyzed color data".

Also, the ab1 files that I am currently using do not have the fields for amplitude of primary and secondary base signals P1AM.1 and P1AM.2, respectively (Checked it using R). So I was wondering how the chromatogram is built when I open it in a viewer (Geneious or Finch)

ab1 sequencing chromatogram • 3.4k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
Dan D 7.2k

Data fields 1 through 4 represent the raw data from each of the four color channels. Each of the four color channels represents a nucleotide letter. Fields 9 through 12 correspond to fields 1 through 4, but have a signal correction applied. These are the fields which are the primary data source for basecalling. They're also used to build the histogram you see in your viewer. The higher the value at an array index, the taller the peak for the associated color:

ADD COMMENT
0
Entering edit mode

The plot you have attached, is it a plot of the values returned when you try accessing the DATA.9/10/11/12 field ? I am getting close to 20,000 data points for a sequence 100bp long, when I use R to extract any of DATA.9 to DATA.12, so I am not very sure which data points in the file correspond to which nucleotide.

ADD REPLY
0
Entering edit mode

DATA9------G DATA10----A DATA11----T DATA12----C

ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6