Pacbio SRA data format error
2
0
Entering edit mode
10 weeks ago

Hi Everyone,

I am trying to replicate the results of the following paper: https://genome.cshlp.org/content/28/3/396

The data is long reads from Pacbio RS II and is available in SRA in the following location : https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SAMN06328616&o=acc_s%3Aa

I am getting an error "ERROR, could not read quality metrics for FASTQ Sequence" when I try to use the program bax2bam to convert the legacy pacbio files to pacbio bam files.

Any suggestions how to convert the files from SRA to pacbio bam files to be used with the IsoSeq3 pipeline?

Thanks

Pacbio SRA bax2bam isoseq3 • 476 views
0
Entering edit mode

Are you using the original data files found under the Data Access tab of a the samples? A representative example from your data here. If not, try those bax files.

0
Entering edit mode

Yes I am trying out the bax files from the Data Access tab. Here is an example file name.

m140326_081232_42154_c100620072550000001823110407181433_s1_p0.1.bax.h5.1

As I mentioned I got the error with these bax files. I have also tried to remove the .1 after the .h5 but still getting the same error. "ERROR, could not read quality metrics for FASTQ Sequence"

0
Entering edit mode

Have you tried multiple files and do they all get the same error?

0
Entering edit mode

I am getting the error for multiple files.

0
Entering edit mode

I have the same problem with bax.h5 files we generated in 2017. I'm seeing this enough to think it's a bug, rather than a problem with input files.

0
Entering edit mode

Interesting. Perhaps contacting PacBio tech support may be the only option then.

1
Entering edit mode
23 days ago
papodek ▴ 10

As already said by foeckingmf it seems to be a bug. I had the same problem and solved it downgrading bax2bam from version 0.0.11 to 0.0.9

conda install -c bioconda bax2bam=0.0.9

0
Entering edit mode

Thank you papodek ! Only thing I would suggest is people should create a new environment for this version instead of installing in conda base.

0
Entering edit mode
16 days ago
foeckingmf • 0

PacBio tech support tofd me that 0.0.11 is a development version on GitHub and to use version 0.0.8 from the SMRTLink v 7.0.1. That generated the .bam files from my .bsx.h5 files, but I'm now dealing with incompatibilities in the analysis tools. AT least these are python scripts so I can have a better idea of what the problem is.

Mainly it seems like PacBio is not real interested in supporting legacy data formats.