Question: Convert Solid Xsq To Fastq Without Intermediates Files
1
gravatar for William
6.9 years ago by
William4.4k
Europe
William4.4k wrote:

I need to convert some big Solid XSQ files to FASTQ files. Is there a tool that can convert directly from XSQ to FASTQ?

Sofar I only found tools to convert first to CSFASTA and .Qual and subsequently from those to FASTQ.

fastq solid • 6.6k views
ADD COMMENTlink written 6.9 years ago by William4.4k
1
gravatar for Laurent Gautier
6.9 years ago by
Laurent Gautier810 wrote:

The XSQ format is HDF5-based, and the only added need is routine to expand the bit-packed DNA and quality strings. There is at least one alternative to the conversion tools provided by Life Technologies: ngs_plumbing.xsq (although probably not tested in all situations).

István's page on color-space formats (link in his post on this page) is remarkably comprehensive yet clear. Do have a look at it in any case if working with SOLiD data.

ADD COMMENTlink written 6.9 years ago by Laurent Gautier810

Neat library. Had I known about it would not have spent the time with my own implementation. Will link it to the tutorial.

ADD REPLYlink written 6.9 years ago by Istvan Albert ♦♦ 80k

I had a similar thought when finally finding your page on the color space (and there is already a link to it from the doc).

ADD REPLYlink written 6.9 years ago by Laurent Gautier810

Did anyone get ngsplumbing.xsq working? First I had to rename the string class in the ngsplumbing package because it conflicts with the Python string class. See http://stackoverflow.com/questions/5889466/attributeerror-module-object-has-no-attribute-maketrans-while-running-cprof

After that the script xsqconvert.py now exits with the message 'Error: the Python package "h5py" is required but could not be imported. Bye.' but if I look in the script wherre the exception is thrown it tries to import ngsplumbing.xsq which I cant find anywhere in the ngsplumbing package or on the system. I would really like to use this tool.

ADD REPLYlink written 6.9 years ago by William4.4k

The best might be share what you are exactly doing. I have been using it to convert SR and PE data and it appeared to work. Regarding the dependency to h5py, it is really needed: XSQ is built on HDF5.

ADD REPLYlink written 6.9 years ago by Laurent Gautier810

Ok it now works good and fast!

The only issue is the clash with python string class. I fixed it by renaming the NGSplumbing string class to stringNGSUtil. The second error is the results of a cascaded import of the NGSplumbing string class in the dna class that now failes. Just edit the import in the class so it imports the renamed stringNGSUtil and the xsqconvert script works.

ADD REPLYlink written 6.9 years ago by William4.4k

Fixes are now in the bitbucket repository and will be included with the next release (any time between now and the end of the summer).

ADD REPLYlink written 6.8 years ago by Laurent Gautier810
0
gravatar for Istvan Albert
6.9 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

The XSQ is a proprietary format. Your options are limited to the conversion tool provided by ABI.

Considering that process of converting a color space representation to FASTQ may mean different types of conversions I think it is safe to assume that this is not offered directly by any tool.

ADD COMMENTlink written 6.9 years ago by Istvan Albert ♦♦ 80k

Thanks for the good tutorial on colorspace . We want to keep the reads in colorspace; transform Xsq to CSFastQ.

ADD REPLYlink written 6.9 years ago by William4.4k
0
gravatar for Marcus Breese
6.8 years ago by
Marcus Breese0 wrote:

If you're still looking for something, I have code that works using PyTables, an alternative to h5py. I could use the extra eyes looking at it before I released it into the wild. I wrote it based on the XSQ spec released by ABI.

My tool converts XSQ files directly to FASTQ (optionally gzipped).

ADD COMMENTlink written 6.8 years ago by Marcus Breese0

PyTables is using its own customizations atop HDF5 (or so it did last time I tried). This is not bad in the absolute but a potential issue when you will want to create / edit XSQ files.

The utilities in ngs_plumbing can convert XSQ data to FASTA-like (FASTA if ECC, or CSFASTA + QUAL) and FASTQ-like (FASTQ if SOLiD's ECC, or CSFASTQ).

In addition to that one can also generate FASTQC-like reports (May be nicer - this is all HTML and javascript) directly from the XSQ.

ADD REPLYlink written 6.8 years ago by Laurent Gautier810
0
gravatar for William
6.8 years ago by
William4.4k
Europe
William4.4k wrote:

I also wrote a new XSQ to FASTQ converter myself in Java using the HDF5 java API http://www.hdfgroup.org/hdf-java-html/ .

A XSQ is a HDF5 file, a format that is also used in other "big data" sciences . This converter is faster than al the others I tried. I guess because I use the native C++ HDF5 libraries, which are OO wrapped by the HDF Java API. All I did was writing the minimal amount of code needed in Java to traverse the file, unpack the byte array to colorspace and qual values, and sink it a fastq file.

With this converter we can convert a 63GB Solid WildFire run with 1.2 billion (50 CS bases) reads in 50 minutes to fastq. The converter can both output normal Sanger CS fastq and BWA 0.5.9 CS fastq dialect. It chuncks the output by default to 1.000.000 reads fastq files to support mapping on clusters.

To make sure the output is correct I used a couple of linux command to diff the output to the output of our old converter. The diff says the output is the same.

The project has moved to github: https://github.com/WimS83/XSQConverter

We and other people are still working on and with this converter.

ADD COMMENTlink modified 5.5 years ago • written 6.8 years ago by William4.4k

That's fast, and indeed probably the fastest tool. Good to have one more around.

I noted that my tools had a problem with CSFASTQ (invalid CSFASQ because of copy/paste issue - we are using either FASTQ or CSFASTA here, this remained unnoticed). After correction, the Python utility in ngs_plumbing is clocking a litte under 3 times slower (~2.5 hours for 63Gbp) for an XSQ -> CSFASTQ conversion. Going faster would require to move a block of few lines of Python down to C in order to compete with Java's runtime (with no certainty to beat it without spending more time on it than it is worth).

ADD REPLYlink written 6.8 years ago by Laurent Gautier810

Hi William,

I am getting errors while I am trying to install your XSQConverter tool.  I asked about it in the issue section of GitHub repository(https://github.com/WimS83/XSQConverter/issues).  Could please have a look at it.

Jan 13: Hi William, I updated the issue in github.  Could you please check that?  I could not find your contact.

D.

 

 

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by deepthithomaskannan250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 866 users visited in the last hour