How to remove newline of a FASTA file?
2
0
Entering edit mode
3.9 years ago
FlowerFLX • 0

Hello, I am new here:

I have two sequences

>NM_020536.5 Homo sapiens lysine acetyltransferase 14 (KAT14), transcript variant 1, mRNA
GAGCCTGGGCAGTACAGGCGGCGGTGCGCACTCTGCGGCGGCCTCTGCGCCTCGGGCGGGCGGGAGAGAG
AGGCCGCGGCCGCCAGCGTGGGGATGTCTAGGAGCTCGAAGGTGGTGCTGGGCCTCTCGGTGCTGCTGAC (..)

>XP_024305840.1 transmembrane protein 62 isoform X5 [Homo sapiens]
MAAVLALRVVAGLAAAALVAMLLEHYGLAGQPSPLPRPAPPRRPHPAPGPGDSNIFWGLQISDIHLSRFR
DPGRAVDLEKFCSETIDIIQPALVLATGDLTDAKTKEQLGSRQHEVEWQTYQGILKKTRVMEKTKWLDIK
GNHDAFNIPSLDSIKNYYRKYSAVRRDGSFHYVHSTPFGNYSFICVDATVNPGPKRPYNFFGILDKKKME (..)

I wanted to remove the newline and used this command: line = line.rstrip("\n") --> does not work

Can anyone help?

FASTA Python Header Parsing • 4.3k views
ADD COMMENT
1
Entering edit mode

There are many tools to accomplish that in *nix environments, but as I can see you are trying to use python, also you might also need to catch whitespace, discard empty lines and etc.

ADD REPLY
0
Entering edit mode

yes, I try to use Python.

ADD REPLY
0
Entering edit mode

please explain shorten the Header of this file at first

ADD REPLY
0
Entering edit mode

Sounds you want to linearize the fasta files (code by @Pierre) :

ADD REPLY
1
Entering edit mode
3.9 years ago
Joe 21k

If you specifically want to do this in python a couple of approaches are below (note you don't have to use sys and argv or stdin for this, it just makes running them from the commandline easy).

The BioPython way (easy parsing)

You can simply re-parse it through:

import sys
from Bio import SeqIO

SeqIO.convert(sys.argv[1], 'fasta', sys.argv[2], 'fasta')

as a one liner (using the shell STDIN):

cat file.fasta | python -c "import sys; from Bio import SeqIO; SeqIO.convert(sys.stdin, sys.argv[1], sys.stdout, sys.argv[2]);" "$1" "$2"

The pure python way

import sys

with open(sys.argv[1], 'r') as fh:
    for line in fh:
        if not line.startswith("\n"):
            print(line, end = "")   # line.rstrip("\n") would also work

(Note, the above approach requires Python 3, or you will need to add a from __future__ import print_function)

ADD COMMENT
0
Entering edit mode
3.9 years ago

Just use sed:

cat test.txt ;

fghfgh


fghhf


fh
f
gh
fgh
fgfh
fhfh
fh
hg










fghfgh
f
f

Now strip the empty lines:

sed '/^$/d' test.txt ;
fghfgh
fghhf
fh
f
gh
fgh
fgfh
fhfh
fh
hg
fghfgh
f
f

sed -i to replace the file as is (risky).

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6