Question: What do the fields DATA.9 - DATA.12 represent in an ab1 file ?
4.2 years ago by
anuragm130 wrote:

I am trying to better understand the information content of a .ab1 file. I wanted to know what the four fields DATA.9 to DATA.12 mean. The R vignette for SangerseqR package explains them as 'Vectors containing signal intensities for each channel' whereas the Applied Biosystems document for .ab1 files calls them "Short Array holding analyzed color data".

Also, the ab1 files that I am currently using do not have the fields for amplitude of primary and secondary base signals P1AM.1 and P1AM.2, respectively (Checked it using R). So I was wondering how the chromatogram is built when I open it in a viewer (Geneious or Finch)

chromatogram ab1 sequencing • 2.0k views
written 4.2 years ago by anuragm130
4.2 years ago by
Dan D6.7k
Dan D6.7k wrote:

Data fields 1 through 4 represent the raw data from each of the four color channels. Each of the four color channels represents a nucleotide letter. Fields 9 through 12 correspond to fields 1 through 4, but have a signal correction applied. These are the fields which are the primary data source for basecalling. They're also used to build the histogram you see in your viewer. The higher the value at an array index, the taller the peak for the associated color:

written 4.2 years ago by Dan D6.7k

The plot you have attached, is it a plot of the values returned when you try accessing the DATA.9/10/11/12 field ? I am getting close to 20,000 data points for a sequence 100bp long, when I use R to extract any of DATA.9 to DATA.12, so I am not very sure which data points in the file correspond to which nucleotide.

written 4.2 years ago by anuragm130

DATA9------G DATA10----A DATA11----T DATA12----C

written 3 months ago by yaotianran0
