Align fastq SOLiD data
5 days ago
Vic ▴ 30

Hello everyone,

I have downloaded some data from the short read archive using the sratoolkit. The data is SOLiD data. I have seen people using the Lifescope (Life Technologies) to align the reads, as I presume it works for this type of data. But unfortunately, I can't get anyone to help with administrator permissions on our cluster, so I'm looking at alternative ways to perform this.

My data was aquired by using the fastq-dump looks like this:

I have seen people mention the bfast solid2fastq or bwa solidtofastq.pl

My question is once I have converted the data using one of these can I proceed with using the bwa aligner and then filter for mapping quality and continue? Or should I align using bfast? What I need is a resulting bam file, as I would like to merge the lanes (Each sample is across 5 or so), edit the readgroups and sort the files so they can be variant called with another data set.

5 days ago
colindaven ★ 3.1k

SOLiD is way out of date now.

• Are you interested in quantitative counting data (RNA-seq, metagenomics etc) ? --> Don't use it - PCR errors

Are you interested in WGS, WES (SNVs) ? --> Don't use it - very different to Illumina, coverage bias/dropouts

Alignment is tricky, bowtie1 does NOT do gapped alignment.

There are a couple of other alignment programs for SOLid listed here : https://en.wikipedia.org/wiki/List_of_sequence_alignment_software

We found the commercial novoalign-cs was by far the best, but slow.

Definitely align color-space data to a genome, not FASTQ data, else you'll get to the magical color-space frameshifts which mess everything up.

Avoid if you can !

Unfortunately, I work on a non-model species and I need to incorporate this data in my study. It has been used in a recent study (2018) but was generated in (2014), your suggestion is just to not use it?

My plan is to align this data to a reference genome that I am converting to a colorspace reference using the command ./bowtie-build -C

Is it WGS? WES ? RNA-seq ? If you can't avoid, then treat with extreme caution. Sure, you're on the right track, but as I pointed out bowtie1 is not a great aligner, since it cannot align reads with indels. Have fun ....

Apologise, the data is WGS.

5 days ago
ATpoint 53k

This is so-called colorspace data. Bowtie can align colorspace data until version 1.3.0 when they dropped support for it. Hence I would get version 1.2.3 and align with it. Easiest is probably to get with conda:

conda install -c bioconda bowtie=1.2.3


You will first have to build a colorspace index from the reference genome (or use one provided on the bowtie website). Then align with bowtie which can optionally output a SAM file (read its manual) which you can later convert to BAM with samtools.

I've installed via conda as you have said.

I need to build a colorspace reference, specifying -C for a colorspace reference:

./bowtie-build -C myreference.fa new_ref


align using bowtie, and specify a sam output:

./bowtie -S new_ref my_data.fastq my_data.sam


then off to samtools, which I'm used to using. Thanks so much ATpoint, you are very helpful!