Question

Solid Sequencing Protocol And How To Obtain Quality Of Solid Reads?

2

Entering edit mode

11.2 years ago

Jordan ★ 1.3k

I have just started working on NGS few weeks back and I was trying to get the pipeline for SOLiD right.

From what I understand the basic pipeline goes something like this (please do correct me if I'm wrong):

get xsq files -> convert them to *.csfasta or *.qual files (using xsq tools) -> check for quality control (by converting *.csfasta and *.qual files to *.fastq files) -> mapping (obtain *.sam files) -> convert them to *.bam (and sort it) -> check for duplicates -> detect variants -> annotate variants

I have the files in *.csfasta and *.qual format with me now. While trying to assess the quality of the reads, I realized that not many tools use *.csfasta and *.qual files as input. I read some place else that, we can convert these files to *.fastq format and use 'fastx tools' to check for quality of the reads. One such tool was 'bfast' and it also maps the reads to reference genome.

Using solid2fastq in bfast when I did the conversion, I think the output I received was wrong. I can tell because the output_file.fastq file looked something like this:

@12869

T..................................................

+ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

@12895

T..................................................

+

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I used the following command to generate the output:

scripts user$ solid2fastq -o output_file sample.csfasta sample.qual

I'm not sure where I'm making the mistake.

And also I wanted to know if converting *.csfasta files to *.fastq files is the right thing to do? Wouldn't you lose the quality? Is there any tool which takes in the native *.csfasta files and computes quality of the reads and also mapping instead?

fastq fastqc solid quality • 7.0k views

ADD COMMENT • link updated 11.2 years ago by William ★ 5.3k • written 11.2 years ago by Jordan ★ 1.3k

score 2 · Answer 1 · 2013-02-01

2

Entering edit mode

11.2 years ago

Damian Kao 16k

Check out NGS plumbing to convert XSQ directly to fastq: http://packages.python.org/ngs_plumbing/xsq.html

ADD COMMENT • link 11.2 years ago by Damian Kao 16k

0

Entering edit mode

I'm still an amateur in python. The module looks interesting though. Thanks for the help!

ADD REPLY • link 11.2 years ago by Jordan ★ 1.3k

score 1 · Answer 2 · 2013-02-01

1

Entering edit mode

11.2 years ago

JC 13k

Reads as T................... are common in SOLiD, those are sequences that cannot be called and are located in certain parts of the file, you can ignore those but check if you have some usable sequences, otherwise it could be a problem in the library preparation/processing.

Also, some mappers can use the csfasta and qual in separate files, such as Bowtie1 and ShRIMP.

ADD COMMENT • link 11.2 years ago by JC 13k

0

Entering edit mode

JC is correct. Many reads in the beginning of file will show that kind of pattern. Check in the middle of the file. It should be OK, About the mapping, I am not sure if SHRiMP can accept csfasta and qual files. I think they need you to convert csfasta and qual in colorspace fastq format.

ADD REPLY • link 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Yes.. I realized that the initial lines alone had that T.......................... pattern. And later on I could find usable reads. Now that I have converted the files to fastq format, I think I can use this file for aligning too. But which would you suggest is the better mapping method? Tools using fastq or tools using *.csfasta and *.qual files directly? Or it does not make a difference?

ADD REPLY • link 11.2 years ago by Jordan ★ 1.3k

0

Entering edit mode

It wont make a difference. SHRiMP2, Bowtie (old version), BWA (old version), BFAST can be used to align SOLiD reads. But I would suggest using SHRiMP2 which I personally think is better for SOLiD reads when compared to others aligners. It will take colourspace fastq sequences as input. It also has the multithreading option so that you can utilize all the nodes on your cluster or computer.

ADD REPLY • link 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

I looked into the SHRIMP2 tool. Looks good. Right now I'm already mapping using BFAST. I will try using SHRIMP2 later on too. Do these tools remove reads with poor quality or should that be done separately?

Thanks for the help!

ADD REPLY • link 11.2 years ago by Jordan ★ 1.3k

0

Entering edit mode

It doesn't trim the reads based on the base quality. But you can give a fixed number of bases to be trimmed off. For example, trim 5 bases from the 3' end as a parameter. OR you can use a wonderful code from Brent (https://github.com/brentp/bio-playground/blob/master/solidstuff/solid-trimmer.py)

ADD REPLY • link 11.2 years ago by Ashutosh Pandey 12k

score 1 · Answer 3 · 2013-02-01

I wrote a fast converter to go from XSQ straight to Sanger FastQ and fake BWA FastQ (which is colorspace but with ATCG instead of 0123, it's a strange format). You can use the latest version of BWA that supports colorspace for mapping (0.5.9). For quality control you can use FastQC and QualiMap.

The only annoying thing about BWA is that the bwa pairing module can only produce paired bam files (with both forward and reverse reads) for SOLID MatePair data, and not Paired End data. You can still create single bam files for Paired End data. Mapping works, this is just a pairing module thing that happens after independent mapping. You can trim the reads before mapping with BWA based on basequality, otherwise mapping reads with low quality tails will potentially create artefacts (which your SNP caller should be able to detect and filter out.) In my opinion BWA is the best open source option for mapping SOLID data ( in quality, speed and usability). The only real alternative is the mapreads mapper in Lifescope software suit from Life, but you need to own a SOLID sequencer for a Lifescope license.

XSQConverter: https://trac.nbic.nl/solid_xsq_converter/admin/downloads/downloads (the binary is the latest version, I need to update the source code, and don't forget to put the HDF5 C++ libraries on your Java Path)

FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

BWA: http://sourceforge.net/projects/bio-bwa/files/

QualiMap: http://qualimap.bioinfo.cipf.es/