PacBio Sequence name
2
1
Entering edit mode
8.9 years ago

Hi,

Could anyone point me out to the meaning of the different fields in a PacBio Fastq sequence name.

Example:

@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/116_1715 0.82 24
ATAGCTGATCGTGAC....
....
@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/1768_3406 0.82 24
ATGCTAGTACG.....

What does it mean that both sequence have the same name prefix (@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/)

Any pointers would be appreciated.

Madi

fastq PacBio • 5.1k views
ADD COMMENT
10
Entering edit mode
8.9 years ago
jkaralius ▴ 100
 m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230 0.99 24
└1┘└─────2─────┘└──3─┘└────────────────4────────────────┘└5┘└6┘└7┘└────8────┘└─9─┘└10┘
  1. m = *movie(
  2. Time of Run Start (yymmdd_hhmmss)
  3. Instrument Serial Number
  4. SMRT Cell Barcode
  5. Set Number (a.k.a. "Look Number". Deprecated field, used in earlier version of RS)
  6. Part Number (usually p0, X0 when using expired reagents)
  7. ZMW hole number
  8. Subread Region (start_stop using polymerase read coordinates)
  9. readScore
  10. barcodeScore
ADD COMMENT
0
Entering edit mode

@jkaralius That's exactly what I needed -- thanks a lot.

ADD REPLY
2
Entering edit mode
8.9 years ago
thackl ★ 3.0k

PacBio SMRT-sequencing uses a circularized DNA template fragments for sequencing. Depending on the length of the fragment, the polymerase loops along this template multiple times. The produces a read with actual multiple segments, called subreads, each representing (at least a part) of the actual template.

@m...512_s1_p0/93 corresponds to a read of one circularized fragment.

@m...512_s1_p0/93/116_1715 and @m...512_s1_p0/93/1768_3406 correspond to two subreads from the same template, with the two numbers after the second / giving you the coordinates of the subread relative to the original read.

The orientation of subsequent subreads alternates (forward - reverse-complement - forward - ...). If you align them, you can see the similarities.

For more details, also about the first parts of the read IDs, have a look for example at SMRT-sequencing workflow and Understanding-PacBio-transcriptome-data (it says transcriptome, but is the general explanations are true for genomic sequencing as well)

ADD COMMENT
0
Entering edit mode

@thackl the reason I was asking about the name is that I wanted to know which part represented the ZMW number so that I can know which reads are from the same insert. Your links were helpful though. Thanks for taking the time to post an answer!

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6