I'm dealing with HGS data generated by SOLiD 3 sequencing data, and in the pre-processing step, I want to find the contamination reads from the sequencing data. But since the data is in color-space form, I can't find a software to make the function.(Some softwares such as SeqTrim are designed for 454&Illumina platform). So did anyone meet with the same problem? How can I deal with the color-space data about contamination or artifacts?
There are very few (if any) tools that can do this for color space data - perhaps Bioscope (ABI's proprietary tool) has the utilities that partially help (but I don't have access to this tool)
The methodologies that you link to depend heavily on BLAST to align so your only chance is to convert your sequences to basespace
In general converting to basespace is an operation that is much frowned upon and strongly discouraged. From my own practical observations on realistic datasets I found that the conversion reduced the amount of correctly mapped reads by only 10% or so - which was a whole lot better than I expected - the loss in data was definitely worth gaining the ability to use a wider range of tools.
That being said your experience might be different. In the end the conversion is easy to do thus certainly worth trying.