Pacbio SRA data format error
2
2
Entering edit mode
3.2 years ago

Hi Everyone,

I am trying to replicate the results of the following paper: https://genome.cshlp.org/content/28/3/396

The data is long reads from Pacbio RS II and is available in SRA in the following location : https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SAMN06328616&o=acc_s%3Aa

I am getting an error "ERROR, could not read quality metrics for FASTQ Sequence" when I try to use the program bax2bam to convert the legacy pacbio files to pacbio bam files.

Any suggestions how to convert the files from SRA to pacbio bam files to be used with the IsoSeq3 pipeline?

Thanks

Pacbio SRA bax2bam isoseq3 • 2.5k views
ADD COMMENT
0
Entering edit mode

Are you using the original data files found under the Data Access tab of a the samples? A representative example from your data here. If not, try those bax files.

ADD REPLY
0
Entering edit mode

Yes I am trying out the bax files from the Data Access tab. Here is an example file name.

m140326_081232_42154_c100620072550000001823110407181433_s1_p0.1.bax.h5.1

As I mentioned I got the error with these bax files. I have also tried to remove the .1 after the .h5 but still getting the same error. "ERROR, could not read quality metrics for FASTQ Sequence"

ADD REPLY
0
Entering edit mode

Have you tried multiple files and do they all get the same error?

ADD REPLY
0
Entering edit mode

I am getting the error for multiple files.

ADD REPLY
0
Entering edit mode

I have the same problem with bax.h5 files we generated in 2017. I'm seeing this enough to think it's a bug, rather than a problem with input files.

ADD REPLY
0
Entering edit mode

Interesting. Perhaps contacting PacBio tech support may be the only option then.

ADD REPLY
3
Entering edit mode
3.0 years ago
papodek ▴ 40

As already said by foeckingmf it seems to be a bug. I had the same problem and solved it downgrading bax2bam from version 0.0.11 to 0.0.9

conda install -c bioconda bax2bam=0.0.9 
ADD COMMENT
0
Entering edit mode

Thank you papodek ! Only thing I would suggest is people should create a new environment for this version instead of installing in conda base.

ADD REPLY
0
Entering edit mode
3.0 years ago
foeckingmf • 0

PacBio tech support tofd me that 0.0.11 is a development version on GitHub and to use version 0.0.8 from the SMRTLink v 7.0.1. That generated the .bam files from my .bsx.h5 files, but I'm now dealing with incompatibilities in the analysis tools. AT least these are python scripts so I can have a better idea of what the problem is.

Mainly it seems like PacBio is not real interested in supporting legacy data formats.

ADD COMMENT
1
Entering edit mode

PacBio tech support tofd me that 0.0.11

But they put it on conda...

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6