I would like to know how to remove the newline from a certain part of my file, but not all of it.
I am piping the result of my program into sed in order to convert the file to a specific format. The input file looks like that:
>sctg_0002_0001 length=2745 TCCCCCTCCCGTACCGGTTTGCGCTATTATACCGGCCTTGAATCGAGCAAAGGCTCCAAACAATTTCATTACAAACAGATTGGGGATGTATGACGTGGCT NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN TTGACACGCTTGTTTCTGATGTCATCACCCATGAAGAGCTGTTATTTGGCCACCTGGCGTTCCTGCCTAAGCGTTGAGTGAATATTAAACACCTCTGCCC >sctg_0003_0001 length=2175 CAACAACCACTCTTAGCGCTGCTTGCCGCTGCCGATACCGAACGGGATGCGGTAGTCGCTGCTCTGCTCACCCAGACTCACGGTCAGGTTGCCCTGAGTA NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ...
This is what I need to do:
convert the ">" symbol into the string "SEQUENCE_ID="
remove everything after the double spaces in the header, e.g. " length=2745"
add part of the next header into the actual one - this should looks like that sctg00020001e00030001b
- 0002_0001 is part of the first header
- 0003_0001 is part of the second header.
delete the newline symbols from the sequence itself as if to make the fastA in one single line.
add the string "SEQUENCE_TEMPLATE=" to the sequence line.
add the symbol "=" after each sequence line
This is what I have done so far
perl convert_FastA.pl ScaffoldContigs.fasta | sed -e '/^>/ s/>/SEQUENCE_ID=/' | sed -e ':a;N;$!ba;/^SEQUENCE_ID=/ ! s/\n//'
the results of the first part looks like the sample above. rst sed command replace the ">" with the pattern needed.
At the end it should look like that:
SEQUENCE_ID=sctg_0002_0001e_0003_0001b SEQUENCE_TEMPLATE=CCCCCTCCCGTACCGGTTTGCGCTA... = SEQUENCE_ID=sctg_0003_0001e_0001_0001b SEQUENCE_TEMPLATE=CAACAACCACTCTTAGCGCTGCTTG... =
I tried to delete the newline with sed, but it didn't work how I imgined it. It delete either all of them or none. Besides I couldn't find any way to "save" the next line in order to put it in the header of the sequence before that.
I would appreciate any help I can get.