PacBio Raw Data File Formats
1
0
Entering edit mode
20 months ago
priya.bmg ▴ 60

Hello

Does anyone know tools to see the contents in PacBio movie files. I get movie files in three different formats: fasta, fastq and bam. But these files are different from the Illumina files and it is not possible to read the contents using the cat or head command.

Is the raw data from sequel instrument comes as bam files which is converted as movie files (fasta, fastq and bam).

Thanks

PacBio • 1.8k views
ADD COMMENT
0
Entering edit mode

PacBio data files should be no different than Illumina files except that each read may be insanely long compared to short illumina reads. They are still going to be following formats for files you mention.

There is one exception to make a note of.

Calculated Q scores are no longer provided for PacBio data (see this link):

Please note that raw data quality scores are the same for all bases of the Sequel raw data (PHRED 0 — ASCII !). PacBio came to the conclusion that computing the quality scores for the raw data was a waste of time. Apparently the quality scores for the raw data cannot be reliably computed

ADD REPLY
0
Entering edit mode

PacBio data of late Q scores are no longer provided for PacBio data

This is only true for subreads data. Nearly all PacBio data is HiFi these days, which comes with PHRED scores.

ADD REPLY
0
Entering edit mode

Thanks for the clarification. I have not worked with PacBio data of late but remembered the page from a past discussion.

If you are with PacBio then you may want to consider making this information prominently available on company site. There is no definitive page to be found in a simple google search that lists this from official PacBio site e.g. "pacbio q score" via a google search.

ADD REPLY
2
Entering edit mode
20 months ago
gconcepcion ▴ 410

Hello,

fasta & fastq files (unless they are gzipped) are regular text files and can be read with the cat or head command. In the case they are gzipped, you can still read them with the "zcat" command.

BAM files are binary compressed version of SAM files and can be read with the program samtools: https://github.com/samtools/samtools

There is a good tutorial on manipulating bam files here: http://quinlanlab.org/tutorials/samtools/samtools.html

The raw data comes off the instrument as a bam file and is subsequently converted to both .fasta and .fastq. Alot of people only need the fasta or fastq text representation of the sequencing data, but service providers will often provide all three to users because there is additional data in the *.bam file that certain analyses require.

ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6