Question

The Number Substitute The Character

0

Entering edit mode

11.7 years ago

camelbbs ▴ 710

HI,

In 1000genomes data, I found the fastq of exome seq is like this:

@341035 0PEBCSOLiDPEP20110531001BSOLiDPEP20110531001B1323_187/1

T232102322030210120120010000120000330003.320..03032

+

!'(%%%)%%'%&&&%&%&%%%,)''%&%%%)%.&%%(%.%!&/&!!&()%%

I want to ask if this format can be mapped to genome by bwa directly. Do I need to convert the 01,2,3 to t,c,g,a?

How can I interpret it.

thanks,

Chunjiang

exome seq • 2.0k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 11.7 years ago by camelbbs ▴ 710

score 4 · Answer 1 · 2012-07-31

4

Entering edit mode

11.7 years ago

matted 7.8k

This is SOLiD colorspace data. There are a variety of approaches to working with it, but they all require a bit of reading and thought. Look through this forum and other sources for relevant information for your current task. This post might be a good place to start.

ADD COMMENT • link 11.7 years ago by matted 7.8k

0

Entering edit mode

+1 for "... they all require a bit of reading and thought."

ADD REPLY • link 11.7 years ago by seidel 11k

0

Entering edit mode

thanks.........

ADD REPLY • link 11.7 years ago by camelbbs ▴ 710

score 0 · Answer 2 · 2012-07-31

0

Entering edit mode

11.7 years ago

camelbbs ▴ 710

Thanks. But are you sure that's the solid colorspace format?

I just download it from 1000genomes. I check the info in SRA database related to this is:

==============================

Accession: ERX024730

Experiment design: Solexa sequencing of Human individual HG00881 random pair end library

Submission: ERA062402 by BGI

Study summary: Exome sequencing of the Chinese Dai in Xishuangbanna, China (CDX) (SRP004062) • Study • All experiments (more...)

Sample: (more...)

Library: HUMaghXGZAAAPEI-9 (more...)

Platform: Illumina (less...)

Instrument model: Illumina HiSeq 2000

==================================

And i checked bwa, there is a script called solid2fastq.pl, it need two files to work (csfasta, quality file).

Any other tools can do it? Thanks.

ADD COMMENT • link 11.7 years ago by camelbbs ▴ 710

0

Entering edit mode

I think you are mixing something up. Where exactly are you getting the FASTQ file (above) from, and how are you telling that it's supposedly this run? The run you indicate is indeed an Illumina run, and from the ENA an example line is:

@ERR047667.1 FCB09RWABXX:3:1101:1137:2055/1
ANTTACTGATAATAGTTATATCACTAATTTCAGTTTAACAAAAAGGTTCACTATAACTTATTTTAATCTCTGTAATAACTTCAAATTAAA
+
C#1ADFFFHHHHHJJIIIJIJJJJJJJJJJJJJIJJJJIIIJJJJJGHIFIJJJJJJJJJJJJJJJJIIJJJJIIGIJJJJJJJJHHHHH

But it doesn't match your example, which is definitely colorspace. Furthermore, your example has the string "SOLiD" in the read name.

ADD REPLY • link 11.7 years ago by matted 7.8k

1

Entering edit mode

I have a guess for your error:

For the human sample this corresponds to (HG00881), the exome sequencing was on Illumina and the low-coverage sequencing was on SOLiD. There are three Illumina runs:

ERX024730 ERX024731 ERX024732

And three SOLiD runs: ERX016841 ERX016842 ERX016843

So I assume you mixed up the two groups.

The SOLiD FASTQ files, as expected, look like your original example, e.g.:

@ERR039668.341035 solid0738_20110610_PE_BC_SOLiDPEP20110531001_B_SOLiDPEP20110531001_B_13_23_187/1
T232102322030210120120010000120000330003.320..03032
+
!'(%%%)%%'%&&&%&%&%%%,)''%&%%%)%.&%%(%.%!&/&!!&()%%

EDIT: I realized this read is your exact example above. It's the first read from run ERX016841, which indeed is annotated as SOLiD, not the one you report above. Not sure where that came from.

ADD REPLY • link 11.7 years ago by matted 7.8k

0

Entering edit mode

Yes. I mixed up them. Thanks a lot. But I am not sure what does they mean the Low coverage region. Is that not the whole genome seq? and I think I dont know how to align this seq.

@ERR039668.341035 solid073820110610PEBCSOLiDPEP20110531001BSOLiDPEP20110531001B1323187/1

T232102322030210120120010000120000330003.320..03032

+

!'(%%%)%%'%&&&%&%&%%%,)''%&%%%)%.&%%(%.%!&/&!!&()%%

ADD REPLY • link 11.7 years ago by camelbbs ▴ 710