Solid Data From Sra
1
1
Entering edit mode
11.4 years ago
sah9b ▴ 100

Hi,

I downloaded some SOLiD data from SRA. I unarchived the data using the SRA toolkit, but I have no idea what the file format is. It seems to be non-standard, but I could be wrong. Any help figuring out what file format this is would be appreciated. Here is what the file looks like for the first 2 sequences:

1. PLATFORM: SRA_PLATFORM_ABSOLID/
1. NREADS: 1
1. READ_TYPE: SRA_READ_TYPE_BIOLOGICAL
1. READ_SEG: {0,50}
1. LABEL_SEG: {0,2}
1. LABEL: F3
1. READ_FILTER: SRA_READ_FILTER_PASS
1. SPOT_GROUP: \x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0
1. CS_KEY: T
1. CSREAD: 2022112223320133.22300.1.00.1.22320303.3321031.000
1. QUALITY: 6,6,6,16,5,11,8,14,14,18,20,11,5,20,11,18,0,11,18,5,5,5,0,11,0,9,5,0,8,0,16,5,16,11,5,18,5,18,0,5,18,5,20,3,5,5,0,23,11,21
2. PLATFORM: SRA_PLATFORM_ABSOLID
2. NREADS: 1
2. READ_TYPE: SRA_READ_TYPE_BIOLOGICAL
2. READ_SEG: {0,50}
2. LABEL_SEG: {0,2}
2. LABEL: F3
2. READ_FILTER: SRA_READ_FILTER_PASS
2. SPOT_GROUP: \x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0\x 0
2. CS_KEY: T
2. CSREAD: 3311213023312230330302.0.2011.1232233033330332.000
2. QUALITY: 8,5,5,8,5,8,20,5,5,11,16,7,5,4,6,6,18,5,5,5,5,5,0,5,0,5,3,5,5,0,5,5,3,5,5,5,8,6,5,5,5,16,10,5,5,5,0,5,11,6
solid sra • 2.8k views
ADD COMMENT
3
Entering edit mode
11.4 years ago
JC 13k

SOLiD is now reporting the reads in binary HDF5 format, they have some tools to convert to csfastq, but it looks like the file you get from SRA is the text version of that. I think it's possible to extract the reads parsing this file than trying to reconstruct the original binary file. If you check the file, CS_KEY its the first letter in your read, CSREAD it's the sequence in color space and QUALITY are the quality scores per base. So your sequence will be:

@sequence_1
T2022112223320133.22300.1.00.1.22320303.3321031.000
+
FFF...
@sequence_2
T3311213023312230330302.0.2011.1232233033330332.000
+
HEE...
ADD COMMENT

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6