Question: removing line breaks and adding carriage returns
0
gravatar for James
8 months ago by
James20
APHA Weybridge, UK
James20 wrote:

Hi there,

Please can anyone help with this.

I am trying to make a fasta file, created in a Linux NGS pipeline more windows friendly.

I have a fasta file with several sequences in it generated by an NGS pipeline.

I'd really like to reformat the file doing several things, I currently do this manually with a text editor and want to speed things up. Im pretty new to Linux and scripts but guess this could be done much more efficiently. my fasta files looks a bit like this for each gene.

>gene-name-iter4\n
somesequence\n
somesequnece\n
somesequence\n
>gene2name-iter4\n
somesequence\n
somesequence\n

At the moment I remove all \n style line breaks then replace all > with \r\n> to put in windows style carriage returns replace all iter4 with iter4\r\n delete the one carriage return that is now before my first sequence, and add a carriage return to the end of the very last line.

Please can someone show me how to make this easier in Linux, I have access to centos virtual machine, but then move the output consensus fasta file from whole genome sequencing to windows computers.

Thank you James

ADD COMMENTlink modified 8 months ago by finswimmer11k • written 8 months ago by James20

before exporting, you can try unix2dos on linux machine.

ADD REPLYlink written 8 months ago by cpad011211k

Hi, thanks for the reply. Just tried unix2dos unix2dos -l input.fasta

but that seems to change every \n to \r\n and there are \n at the end of several lines per sequence. Any idea if I should be choosing different options? Thanks

ADD REPLYlink written 8 months ago by James20

Windows currently has support for \n line endings. Plus, did you try man unix2dos?

ADD REPLYlink written 8 months ago by RamRS20k

What have you tried? This is a fairly common problem, have you tried Googling?

ADD REPLYlink written 8 months ago by RamRS20k

Hi there, I have tried googling, which is where I sort of found sed or tr should be able to help. For removing all \n I have tried sed 's/\n//g' input.fasta > output.fasta but it doesn't seem to actually do anything, output file is created but still has \n linebreaks I also tried tr -d '\n' input.fasta > output.fasta but that gives me an error tr: extra operand

ADD REPLYlink written 8 months ago by James20

if you want to go sed way:

output:

$ sed 's/\\n//g' test.txt 
>gene-name-iter4
somesequence
somesequnece
somesequence
>gene2name-iter4
somesequence
somesequence

input:

$ cat test.txt 
>gene-name-iter4\n
somesequence\n
somesequnece\n
somesequence\n
>gene2name-iter4\n
somesequence\n
somesequence\n
ADD REPLYlink written 8 months ago by cpad011211k

Hi thanks for the reply This doesn't seem to work for me,. Sed 's/\n//g' input.fasta > output.fasta All the \n are still there. But Sed 's/iter4/iter4\r\n/g' does work for adding carriage returns everywhere there is iter4

ADD REPLYlink written 8 months ago by James20

Did you man sed? sed has an extended regex option, and unless you know exactly which sed you're using (pro-tip: mac/BSD sed is the absolute worst), you will need to try a couple of flags, notably the -r flag.

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour