Question: Solid Sequencing Protocol And How To Obtain Quality Of Solid Reads?
2
gravatar for Jordan
6.4 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

I have just started working on NGS few weeks back and I was trying to get the pipeline for SOLiD right.

From what I understand the basic pipeline goes something like this (please do correct me if I'm wrong):

get xsq files -> convert them to *.csfasta or *.qual files (using xsq tools) -> check for quality control (by converting *.csfasta and *.qual files to *.fastq files) -> mapping (obtain *.sam files) -> convert them to *.bam (and sort it) -> check for duplicates -> detect variants -> annotate variants

I have the files in *.csfasta and *.qual format with me now. While trying to assess the quality of the reads, I realized that not many tools use *.csfasta and *.qual files as input. I read some place else that, we can convert these files to *.fastq format and use 'fastx tools' to check for quality of the reads. One such tool was 'bfast' and it also maps the reads to reference genome.

Using solid2fastq in bfast when I did the conversion, I think the output I received was wrong. I can tell because the output_file.fastq file looked something like this:

@12869

T..................................................

+ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

@12895

T..................................................

+

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I used the following command to generate the output:

scripts user$ solid2fastq -o output_file sample.csfasta sample.qual

I'm not sure where I'm making the mistake.

And also I wanted to know if converting *.csfasta files to *.fastq files is the right thing to do? Wouldn't you lose the quality? Is there any tool which takes in the native *.csfasta files and computes quality of the reads and also mapping instead?

fastq solid quality fastqc • 5.3k views
ADD COMMENTlink modified 6.4 years ago by William4.4k • written 6.4 years ago by Jordan1.1k
2
gravatar for Damian Kao
6.4 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Check out NGS plumbing to convert XSQ directly to fastq: http://packages.python.org/ngs_plumbing/xsq.html

ADD COMMENTlink written 6.4 years ago by Damian Kao15k

I'm still an amateur in python. The module looks interesting though. Thanks for the help!

ADD REPLYlink written 6.4 years ago by Jordan1.1k
1
gravatar for JC
6.4 years ago by
JC8.0k
Mexico
JC8.0k wrote:

Reads as T................... are common in SOLiD, those are sequences that cannot be called and are located in certain parts of the file, you can ignore those but check if you have some usable sequences, otherwise it could be a problem in the library preparation/processing.

Also, some mappers can use the csfasta and qual in separate files, such as Bowtie1 and ShRIMP.

ADD COMMENTlink written 6.4 years ago by JC8.0k

JC is correct. Many reads in the beginning of file will show that kind of pattern. Check in the middle of the file. It should be OK, About the mapping, I am not sure if SHRiMP can accept csfasta and qual files. I think they need you to convert csfasta and qual in colorspace fastq format.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Ashutosh Pandey11k

Yes.. I realized that the initial lines alone had that T.......................... pattern. And later on I could find usable reads. Now that I have converted the files to fastq format, I think I can use this file for aligning too. But which would you suggest is the better mapping method? Tools using fastq or tools using *.csfasta and *.qual files directly? Or it does not make a difference?

ADD REPLYlink written 6.4 years ago by Jordan1.1k

It wont make a difference. SHRiMP2, Bowtie (old version), BWA (old version), BFAST can be used to align SOLiD reads. But I would suggest using SHRiMP2 which I personally think is better for SOLiD reads when compared to others aligners. It will take colourspace fastq sequences as input. It also has the multithreading option so that you can utilize all the nodes on your cluster or computer.

ADD REPLYlink written 6.4 years ago by Ashutosh Pandey11k

I looked into the SHRIMP2 tool. Looks good. Right now I'm already mapping using BFAST. I will try using SHRIMP2 later on too. Do these tools remove reads with poor quality or should that be done separately?

Thanks for the help!

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Jordan1.1k

It doesn't trim the reads based on the base quality. But you can give a fixed number of bases to be trimmed off. For example, trim 5 bases from the 3' end as a parameter. OR you can use a wonderful code from Brent (https://github.com/brentp/bio-playground/blob/master/solidstuff/solid-trimmer.py)

ADD REPLYlink written 6.4 years ago by Ashutosh Pandey11k
1
gravatar for William
6.4 years ago by
William4.4k
Europe
William4.4k wrote:

I wrote a fast converter to go from XSQ straight to Sanger FastQ and fake BWA FastQ (which is colorspace but with ATCG instead of 0123, it's a strange format). You can use the latest version of BWA that supports colorspace for mapping (0.5.9). For quality control you can use FastQC and QualiMap.

The only annoying thing about BWA is that the bwa pairing module can only produce paired bam files (with both forward and reverse reads) for SOLID MatePair data, and not Paired End data. You can still create single bam files for Paired End data. Mapping works, this is just a pairing module thing that happens after independent mapping. You can trim the reads before mapping with BWA based on basequality, otherwise mapping reads with low quality tails will potentially create artefacts (which your SNP caller should be able to detect and filter out.) In my opinion BWA is the best open source option for mapping SOLID data ( in quality, speed and usability). The only real alternative is the mapreads mapper in Lifescope software suit from Life, but you need to own a SOLID sequencer for a Lifescope license.

XSQConverter: https://trac.nbic.nl/solid_xsq_converter/admin/downloads/downloads (the binary is the latest version, I need to update the source code, and don't forget to put the HDF5 C++ libraries on your Java Path)

FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

BWA: http://sourceforge.net/projects/bio-bwa/files/

QualiMap: http://qualimap.bioinfo.cipf.es/

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by William4.4k

I am not sure if FasttQC works on the BWA FastQ format. I use my own ReadQC tool that also gives a quality overview of the reads.

ADD REPLYlink written 6.4 years ago by William4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1667 users visited in the last hour