convert from solid fastq to sanger fastq
1
1
Entering edit mode
7.2 years ago
pcantalupo ▴ 120

Hello,

I downloaded Solid fastq file from here: ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA062/SRA062077/SRX207729/. I need to convert this format into sanger fastq because the software pipeline that I wrote only handles sanger fastq.

Here is what the first two sequences in the file looks like:

@SRR943115.1 solid0530_20110107_PE_HIVCB212TRL_HIVCB212TRL_2_27_44_F3 length=50
T.03.0.1.....................................1....3
+SRR943115.1 solid0530_20110107_PE_HIVCB212TRL_HIVCB212TRL_2_27_44_F3 length=50
!!A5!+!:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!%!!!!%
@SRR943115.2 solid0530_20110107_PE_HIVCB212TRL_HIVCB212TRL_2_27_210_F3 length=50
T.12.1.0.....................................1....0
+SRR943115.2 solid0530_20110107_PE_HIVCB212TRL_HIVCB212TRL_2_27_210_F3 length=50
!!8%!,!%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!8!!!!&

I already tried to use 'solid2fastq' in BFAST but that program requires a csfasta file and a qual file as input. I have not been able to find a csfastq -> fastq converter script.

Thank you,

Paul

solid fastq • 4.6k views
0
Entering edit mode

I think you will find your answer if you look for similar posts.

0
Entering edit mode

As Chris said this issue has been widely discussed on Biostars. BTW, which aligner are you using for the alignment?

0
Entering edit mode

I am aligning with Bowtie2 but then I want to capture the unmapped sequences (I could care less about the human reads) for analysis in my viral meta-genomics pipeline which needs real Gs As Ts and Cs. Thank you again...Paul

0
Entering edit mode

There are two ways you can convert color space (.fasta, .qual) data into fastq. One is using a lossy compression method that is never recommended as a single error by sequencing machine in the read will be transmitted to the preceding bases in the reads durign translation. The other conversion method doesn't translate the reads into A, T, C, G or nucleotide reads and prepares colorspace fastq files that can be aligned against the reference genome which has been indexed accordingly for color space reads. Both SHRiMP2 and Bowtie can map such reads. Now coming to your question, first thing: Bowtie2 can't be used to map colorspace fastq or csfasta reads. So you will have to use Bowtie. I think the fasta file you have shown above (2 lines) is already formatted and can be used by bowtie (Remember bowtie not bowtie2). See this: http://bowtie-bio.sourceforge.net/manual.shtml#colorspace-alignment

2
Entering edit mode
7.2 years ago
pcantalupo ▴ 120

Hello Stars,

Thank you to those that commented above. Since I couldn't find a script to suit my needs, I wrote my own colorspace fastq to sanger fastq Perl script in case anybody is interested. If you detect a bug in the script, please let me know.

P

1
Entering edit mode

In my comments, I pointed out problems with this kind of conversion. Read this post: Transforming And Manipulating Color Space Reads

1
Entering edit mode

duly noted.