Hi everyone, and thanks in advance! I'm used to doing lots of trimming, substituting, etc on large FASTQ/A files, but now I need to add sequence arbitrarily at the beginning of all reads and I'm coming up short! Been searching a couple hours for a method via toolkit (fastx_toolkit, BBmap, etc.) or simple command (sed, awk, etc.).
So I'm looking to go from something like this:
>header GTCTCAGATCGGAAGAGCACACGT >header CCGGTCCTGGTTGCAGATCGGAAG >header GTATCTCCTAAGATATAACAGGTTG >header AGGTACAGGTTGGATGATAAGTCC
>header AAAAAAGTCTCAGATCGGAAGAGCACACGT >header AAAAAACCGGTCCTGGTTGCAGATCGGAAG >header AAAAAAGTATCTCCTAAGATATAACAGGTTG >header AAAAAAAGGTACAGGTTGGATGATAAGTCC
Alternatively, I can do the same with FASTQ files (also extending the quality lines to match), if there's already a tool out there for that. I'm not interested at quality at this point, as I've already merged paired-end reads with PandaSeq and filtered out anything but the highest quality reads.
While you have been given possible solutions below, you would be breaking fastq format if you do not add corresponding scores on the quality line. Example you showed above is neither valid fasta or fastq format.
Ah yes, sorry, I should have been more accurate with that in case others come across this. I'll edit it to look like a real FASTA.
quickquark : Please test @Pierre's solution. It should work and if it does you should accept that too. You can accept more than one answer if they work.