Question

PacBio Sequence name

1

Entering edit mode

8.9 years ago

apt.university ▴ 70

Hi,

Could anyone point me out to the meaning of the different fields in a PacBio Fastq sequence name.

Example:

@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/116_1715 0.82 24
ATAGCTGATCGTGAC....
....
@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/1768_3406 0.82 24
ATGCTAGTACG.....

What does it mean that both sequence have the same name prefix (@m141104_013014_42198_c100718132550000001823144805141512_s1_p0/93/)

Any pointers would be appreciated.

Madi

fastq PacBio • 5.1k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by apt.university ▴ 70

Ram · Answer 1 · 2015-06-11

10

Entering edit mode

8.9 years ago

jkaralius ▴ 100

 m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230 0.99 24
└1┘└─────2─────┘└──3─┘└────────────────4────────────────┘└5┘└6┘└7┘└────8────┘└─9─┘└10┘

m = *movie(
Time of Run Start (yymmdd_hhmmss)
Instrument Serial Number
SMRT Cell Barcode
Set Number (a.k.a. "Look Number". Deprecated field, used in earlier version of RS)
Part Number (usually p0, X0 when using expired reagents)
ZMW hole number
Subread Region (start_stop using polymerase read coordinates)
readScore
barcodeScore

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by jkaralius ▴ 100

0

Entering edit mode

@jkaralius That's exactly what I needed -- thanks a lot.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by apt.university ▴ 70

Ram · Answer 2 · 2015-06-11

PacBio SMRT-sequencing uses a circularized DNA template fragments for sequencing. Depending on the length of the fragment, the polymerase loops along this template multiple times. The produces a read with actual multiple segments, called subreads, each representing (at least a part) of the actual template.

@m...512_s1_p0/93 corresponds to a read of one circularized fragment.

@m...512_s1_p0/93/116_1715 and @m...512_s1_p0/93/1768_3406 correspond to two subreads from the same template, with the two numbers after the second / giving you the coordinates of the subread relative to the original read.

The orientation of subsequent subreads alternates (forward - reverse-complement - forward - ...). If you align them, you can see the similarities.

For more details, also about the first parts of the read IDs, have a look for example at SMRT-sequencing workflow and Understanding-PacBio-transcriptome-data (it says transcriptome, but is the general explanations are true for genomic sequencing as well)