Question

How to remove newline of a FASTA file?

0

Entering edit mode

3.9 years ago

FlowerFLX • 0

Hello, I am new here:

I have two sequences

>NM_020536.5 Homo sapiens lysine acetyltransferase 14 (KAT14), transcript variant 1, mRNA
GAGCCTGGGCAGTACAGGCGGCGGTGCGCACTCTGCGGCGGCCTCTGCGCCTCGGGCGGGCGGGAGAGAG
AGGCCGCGGCCGCCAGCGTGGGGATGTCTAGGAGCTCGAAGGTGGTGCTGGGCCTCTCGGTGCTGCTGAC (..)

>XP_024305840.1 transmembrane protein 62 isoform X5 [Homo sapiens]
MAAVLALRVVAGLAAAALVAMLLEHYGLAGQPSPLPRPAPPRRPHPAPGPGDSNIFWGLQISDIHLSRFR
DPGRAVDLEKFCSETIDIIQPALVLATGDLTDAKTKEQLGSRQHEVEWQTYQGILKKTRVMEKTKWLDIK
GNHDAFNIPSLDSIKNYYRKYSAVRRDGSFHYVHSTPFGNYSFICVDATVNPGPKRPYNFFGILDKKKME (..)

I wanted to remove the newline and used this command: line = line.rstrip("\n") --> does not work

Can anyone help?

FASTA Python Header Parsing • 4.3k views

ADD COMMENT • link updated 3.9 years ago by Joe 21k • written 3.9 years ago by FlowerFLX • 0

1

Entering edit mode

There are many tools to accomplish that in *nix environments, but as I can see you are trying to use python, also you might also need to catch whitespace, discard empty lines and etc.

ADD REPLY • link 3.9 years ago by synchris ▴ 10

0

Entering edit mode

yes, I try to use Python.

ADD REPLY • link 3.9 years ago by FlowerFLX • 0

0

Entering edit mode

please explain shorten the Header of this file at first

ADD REPLY • link 3.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sounds you want to linearize the fasta files (code by @Pierre) :

ADD REPLY • link 3.9 years ago by GenoMax 141k

score 1 · Answer 1 · 2020-06-16

If you specifically want to do this in python a couple of approaches are below (note you don't have to use sys and argv or stdin for this, it just makes running them from the commandline easy).

The BioPython way (easy parsing)

You can simply re-parse it through:

import sys
from Bio import SeqIO

SeqIO.convert(sys.argv[1], 'fasta', sys.argv[2], 'fasta')

as a one liner (using the shell STDIN):

cat file.fasta | python -c "import sys; from Bio import SeqIO; SeqIO.convert(sys.stdin, sys.argv[1], sys.stdout, sys.argv[2]);" "$1" "$2"

The pure python way

import sys

with open(sys.argv[1], 'r') as fh:
    for line in fh:
        if not line.startswith("\n"):
            print(line, end = "")   # line.rstrip("\n") would also work

(Note, the above approach requires Python 3, or you will need to add a from __future__ import print_function)

score 0 · Answer 2 · 2020-06-13

0

Entering edit mode

3.9 years ago

Kevin Blighe 87k

Just use sed:

cat test.txt ;

fghfgh


fghhf


fh
f
gh
fgh
fgfh
fhfh
fh
hg










fghfgh
f
f

Now strip the empty lines:

sed '/^$/d' test.txt ;
fghfgh
fghhf
fh
f
gh
fgh
fgfh
fhfh
fh
hg
fghfgh
f
f

sed -i to replace the file as is (risky).

Kevin

ADD COMMENT • link 3.9 years ago by Kevin Blighe 87k