Pacbio SRA data format error
10 weeks ago

Hi Everyone,

I am trying to replicate the results of the following paper: https://genome.cshlp.org/content/28/3/396

The data is long reads from Pacbio RS II and is available in SRA in the following location : https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SAMN06328616&o=acc_s%3Aa

I am getting an error "ERROR, could not read quality metrics for FASTQ Sequence" when I try to use the program bax2bam to convert the legacy pacbio files to pacbio bam files.

Any suggestions how to convert the files from SRA to pacbio bam files to be used with the IsoSeq3 pipeline?

Thanks

Are you using the original data files found under the Data Access tab of a the samples? A representative example from your data here. If not, try those bax files.

Yes I am trying out the bax files from the Data Access tab. Here is an example file name.

m140326_081232_42154_c100620072550000001823110407181433_s1_p0.1.bax.h5.1

As I mentioned I got the error with these bax files. I have also tried to remove the .1 after the .h5 but still getting the same error. "ERROR, could not read quality metrics for FASTQ Sequence"

Have you tried multiple files and do they all get the same error?

0
I am getting the error for multiple files.

I have the same problem with bax.h5 files we generated in 2017. I'm seeing this enough to think it's a bug, rather than a problem with input files.

0
Interesting. Perhaps contacting PacBio tech support may be the only option then.

23 days ago
papodek ▴ 10

As already said by foeckingmf it seems to be a bug. I had the same problem and solved it downgrading bax2bam from version 0.0.11 to 0.0.9

conda install -c bioconda bax2bam=0.0.9

Thank you papodek ! Only thing I would suggest is people should create a new environment for this version instead of installing in conda base.

0
16 days ago
foeckingmf • 0

PacBio tech support tofd me that 0.0.11 is a development version on GitHub and to use version 0.0.8 from the SMRTLink v 7.0.1. That generated the .bam files from my .bsx.h5 files, but I'm now dealing with incompatibilities in the analysis tools. AT least these are python scripts so I can have a better idea of what the problem is.

Mainly it seems like PacBio is not real interested in supporting legacy data formats.