I am spliting SRF files into FASTQ using the staden io_library program srf2fastq:
srf2fastq -c -s ./fastq/name_split -a -n name.srf
But some of the SRF files contain 4 chunks of sequences instead of two (paired end experiment), creating four FASTQ files _1, _2, _3, _4 with each one with the reads with /1, /2, /3 ,/4 respectively.
The problem is that the FASTQ files _2 and _4 are 'technical reads' that would be discarded and only the _1 and _3 should be use. But this mean that my reads names for the reverse reads (the FASTQ file _3 )end with /3 instead of the usual /2.
Questions:
Would it create confusion to other people leaving them ended in /3 instead of /2?.
Should I rename the reads to have /2 instead /3?.
Can I only extract the wanted 2 chunks from the SRF instead the 4 of them?
Looking at the SRF file with srf_info I can know which chunks I want:
> srf_info -l255 name.srf
Reading archive name.srf.
trace_name: + name_456:8:1:404:759 ... name_456:8:1:381:649 x10
Reads: GOOD : 10
Reads: TOTAL : 10
Chunk: BASE : 10 238
Chunk: CNF1 : 10 409
Chunk: CNF4 : 10 2890
Mdata key: SCALE : 10
Chunk: SMP4 : 10 5780
Chunk: REGN : 10 130
Mdata key: NAME : 10
names=forward:P;skip1:T;reverse:P;skip2:T boundaries=35;36;71 x10
Bases: A: 306
Bases: C: 98
Bases: G: 123
Bases: T: 193
Bases: TOTAL: 720
the Chunk: REGN has the two 'skip' (called 'technical reads') and the two wanted chunks, forward and reverse (the 'application reads').
@Istvan, humm but the -n is about the filenames ( [+1] probably is better to leave it out), but still my problem are the readnames /1 and /3 that is done with -a.