Question: convert SOLID color space to base-space
gravatar for hellbio
5.6 years ago by
hellbio420 wrote:

Hi all,

I have short reads from SOLID5500XL sequencing platform. The reads are in '.xsq' format. I have used XSQ tools from life technologies to convert .xsq to .csfasta and .qual files as shown below;

xsqconvert -c FRAG_BC_01_Can19.xsq

which results in

FRAG_BC_01_Can19_F3.csfasta and FRAG_BC_01_Can19_F3.qual files.

Then i have used '' script from bwa to convert to fastq format as shown below: FRAG_BC_06_Can19_F3.csfasta FRAG_BC_06_Can19_F3.QV.qual

The fastq file still has the base pairs in color space format. My goal is to detect SNP's by aligning to reference genome. For this purpose i would need the data in base space format. Could someone help to do this?

Any help is highly valuable.

base space solid • 3.6k views
ADD COMMENTlink modified 5.6 years ago by Brian Bushnell17k • written 5.6 years ago by hellbio420
gravatar for Brian Bushnell
5.6 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You cannot accurately convert colorspace to base-space without alignment, because a single error will make all subsequent bases incorrect, and Solid reads have lots of errors.  So, you need to use a colorspace-capable aligner to do the alignment to the colorspace-indexed genome, and then convert the aligned reads to base-space prior to calling variations.  Bowtie 1 can do colorspace alignment with the -C flag.  BBMap used to be able to do colorspace mapping, but not anymore.  bwa also used to be able to map in colorspace, but not anymore.

ADD COMMENTlink written 5.6 years ago by Brian Bushnell17k

thank you. Prior to alignment, Quality control to filter lowquality bases appears to be an ideal step for SNP detection. Would you suggest some thresholds and tools to do this? also could you suggest some tools to convert aligned reads from color space to base space?

ADD REPLYlink written 5.6 years ago by hellbio420

I wrote a tool to convert mapped colorspace reads to base-space, but I'm not sure if it works anymore...  I'll look into it.  Bioscope should be able to do it, of course, but it's really crappy.

I DO NOT recommend quality-trimming Solid reads because unlike Illumina reads, the quality profile varies by the position modulo 5 rather than the raw position.  Thus, low-quality bases are scattered throughout the read and trimming the ends is not effective.

ADD REPLYlink written 5.6 years ago by Brian Bushnell17k

You can use tool from brentp. I have used a lot for my research.

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k

You can also use SHRiMP2 a color space read aligner. I have used it extensively for aligning csfastq or csfasta/qual files. Once you have the aligned SAM/BAM file you can use any variant callers that take bam files. 

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k

thank you. Is it required to filter low quality bases before aligning using SHRiMP2? If so, would you suggest a threshold value? I normally use Q20 for Illumina. Is it the same for SOLID reads. Finally, the bam from SHRiMP2 is in color space or base space?

ADD REPLYlink written 5.6 years ago by hellbio420

You can use Q15 or Q20 as a threshold. The bam file will contain both basespace and colorspace sequences. The base space sequence will represented in the 10th column (SEQ field) and colorspace sequence will be a part of the TAGs (last) column. This bam file can now be used with almost all of the tools that work with Illumina bam files. 

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Ashutosh Pandey12k

hello again. I used SHRiMP2 to align and came across the error: "my_realloc error: realloc failed" . Could not find an effective solution elsewhere. Could you help if you have faced this error?

ADD REPLYlink written 5.6 years ago by hellbio420

Hi, Below is what i have done:
 1. .csfatsa and .qual files for each sample which have been converted to fastq using 'fasta2fastq' script available in the SHRiMP2 bin folder.

2. And then, i have used the below command to align:
SHRiMP_2_2_3/bin/gmapper-cs Can19.fastq canFam3.fa -N 24 -Q --qv-offset 33 > Can19.sam

The reference sequence in the above command is in letter-space format and the reads are in color space format.should the reference also be given in the cs format or does shrimp handles the letter space format to align to cs reads?

With the above command, i met with an error "my_realloc error: realloc failed".  Did anyone came across this error?

ADD REPLYlink written 5.6 years ago by hellbio420

I am getting the same error. What is weird is that I have used the same command to run 10 samples in parallel. 2 of which seem to have done just fine, the other 8 have given me the "realloc error". It seems to occur during the genome loading step. Maybe some memory error?

Have you managed to solve this issue?

ADD REPLYlink written 5.6 years ago by lkmklsmn930
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1396 users visited in the last hour