I downloaded lots of SRA files (Chip-seq, RNA-seq, dnase etc.) from Roadmap project. I'm converting them to FASTQ format (
fastq-dump with --split-files option) then do some preprocessing for maintaining consistency.
Since the sequence lengths coming out of these experiments are different, I'm trimming (using fastx_trimmer) the reads to a 36bp length. It works fine for FASTQs from Chip-seq SRAs. However, the FASTQ from RNA seq (ABI SOLID platform) have this format (first 8 lines)
@SRR179594.1 mendel_20110320_FRAG_BC_Ryan_RNA_Seq_2_58_404_F3 length=50 T.11.0223.0120.1020110202.0.0010.0.20.0201.2.021021 +SRR179594.1 mendel_20110320_FRAG_BC_Ryan_RNA_Seq_2_58_404_F3 length=50 !!@B!@A;B!BB:B!BB=A/%>(/%!A!.6%A!/!%'!%5.%!)!/()%-% @SRR179594.2 mendel_20110320_FRAG_BC_Ryan_RNA_Seq_2_58_408_F3 length=50 T.20.3101.000021200002230.2.0312.0.13.0313.0.220003 +SRR179594.2 mendel_20110320_FRAG_BC_Ryan_RNA_Seq_2_58_408_F3 length=50 !!>B!<B:>!@@*?3-;%A9?A%'+!B!51,A!=!<'!:'.:!(!)-'*>5
Using fastx_trimmer on this to keep the first 36bp is throwing an error:
fastx_trimmer: found invalid nucleotide sequence (T.11.0223.0120.1020110202.0.0010.0.20.0201.2.021021) on line 2
Understandably due to a different format from ~ACTGN~. How do I go about this if I were to trim the RNA sequences?