Weird Fastq Sequences
1
1
Entering edit mode
12.3 years ago
Bioscientist ★ 1.7k

I download from 1000genome websites some fastq files shown as below:

@VAB_BARB_20080515_2_Broad_3b_150_2276_6_37_F3
T21123313121322132222331223311312223
+
!'$'&(,&#%4,('%$*$,##+0#-+($)#$%$$&)

What doesn't the second line show up ATGC? Or they use 123 to represent the letter?

Also, such data come from files named as XXXX.fastq.gz While those "normal" data come from files named as XXXX.recal.fastq.gz

So this inspires me to ask what does this "recal" mean?

thx

fastq genome • 2.0k views
ADD COMMENT
8
Entering edit mode
12.3 years ago
Gww ★ 2.7k

Those read sequences are in colorspace rather than basespace, which means that the sequencing was performed using applied biosystems SOLiD sequencing technology. There are aligners that are capable of aligning reads in that format such as bioscope, SHRiMP and BWA. More information about the dibase encoding can be found here.

ADD COMMENT

Login before adding your answer.

Traffic: 3188 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6