PacBio subreads.fastq files?
1
0
Entering edit mode
10 months ago
majeedaasim ▴ 60

I have downloaded PacBio isoseq data as subreads.fastq format from NCBI. Most of the isoseq analysis tools require input as Pacbio .bam file, which is unavailable form NCBI. I want to perform differential gene expression analysis and alternative splicing analysis. I have confusion regarding the nature of the data.

  1. Are the sequences of subreads.fastq file processed for barcode and primer removal or not?
  2. I have read documentation of PacBio, which says that .bam file from Pacbio are converetd to fastq through bam2fastq module, which includes demultiplexing and barcode removal.
  3. Are the subreads fastq files in NCBI generated after ccs calling or through bam2fastq without ccs calling?
ncbi PacBbio • 740 views
ADD COMMENT
0
Entering edit mode
10 months ago

Subreads are the raw sequences without the adapters (smartbells). If you have inline barcodes or primers as part of the sequence, the will be present in the subreads; barcodes within the smartbell adapter will not be in the subreads. PacBio reads used to be fastq before the moved to bam, IIRC, so old data may have never been in bam format in the first place. The subreads files are NOT error-corrected. Sometimes the read headers are useful to look at; the subreads file should have multiple consecutive reads that come from the same ZMW, while the CCS file should only have a single read per ZMW (with very high quality scores).

ADD COMMENT

Login before adding your answer.

Traffic: 1233 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6