removing line breaks and adding carriage returns
0
0
Entering edit mode
5.8 years ago
James ▴ 20

Hi there,

Please can anyone help with this.

I am trying to make a fasta file, created in a Linux NGS pipeline more windows friendly.

I have a fasta file with several sequences in it generated by an NGS pipeline.

I'd really like to reformat the file doing several things, I currently do this manually with a text editor and want to speed things up. Im pretty new to Linux and scripts but guess this could be done much more efficiently. my fasta files looks a bit like this for each gene.

>gene-name-iter4\n
somesequence\n
somesequnece\n
somesequence\n
>gene2name-iter4\n
somesequence\n
somesequence\n

At the moment I remove all \n style line breaks then replace all > with \r\n> to put in windows style carriage returns replace all iter4 with iter4\r\n delete the one carriage return that is now before my first sequence, and add a carriage return to the end of the very last line.

Please can someone show me how to make this easier in Linux, I have access to centos virtual machine, but then move the output consensus fasta file from whole genome sequencing to windows computers.

Thank you James

\n \r\n sed tr windows format document • 2.3k views
ADD COMMENT
0
Entering edit mode

before exporting, you can try unix2dos on linux machine.

ADD REPLY
0
Entering edit mode

Hi, thanks for the reply. Just tried unix2dos unix2dos -l input.fasta

but that seems to change every \n to \r\n and there are \n at the end of several lines per sequence. Any idea if I should be choosing different options? Thanks

ADD REPLY
0
Entering edit mode

Windows currently has support for \n line endings. Plus, did you try man unix2dos?

ADD REPLY
0
Entering edit mode

What have you tried? This is a fairly common problem, have you tried Googling?

ADD REPLY
0
Entering edit mode

Hi there, I have tried googling, which is where I sort of found sed or tr should be able to help. For removing all \n I have tried sed 's/\n//g' input.fasta > output.fasta but it doesn't seem to actually do anything, output file is created but still has \n linebreaks I also tried tr -d '\n' input.fasta > output.fasta but that gives me an error tr: extra operand

ADD REPLY
0
Entering edit mode

if you want to go sed way:

output:

$ sed 's/\\n//g' test.txt 
>gene-name-iter4
somesequence
somesequnece
somesequence
>gene2name-iter4
somesequence
somesequence

input:

$ cat test.txt 
>gene-name-iter4\n
somesequence\n
somesequnece\n
somesequence\n
>gene2name-iter4\n
somesequence\n
somesequence\n
ADD REPLY
0
Entering edit mode

Hi thanks for the reply This doesn't seem to work for me,. Sed 's/\n//g' input.fasta > output.fasta All the \n are still there. But Sed 's/iter4/iter4\r\n/g' does work for adding carriage returns everywhere there is iter4

ADD REPLY
0
Entering edit mode

Did you man sed? sed has an extended regex option, and unless you know exactly which sed you're using (pro-tip: mac/BSD sed is the absolute worst), you will need to try a couple of flags, notably the -r flag.

ADD REPLY

Login before adding your answer.

Traffic: 2791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6