Question: How to remove the empty line in using python
0
gravatar for horsedog
2.4 years ago by
horsedog60
horsedog60 wrote:

Hi, I'm trying to rename all the sequences, my purpose is to add the taxonomy to each accession number in query.
The original ones look like this:

>YP_003612801.1   
MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV
YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA
AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

After I run my script it looks like this

>YP_003612801.1  
_Firmicutes_Clostridia_Clostridiales

MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV

YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA

AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

I don't know why there are the empty lines among different lines and I want the taxonomy be appended to the same line to the accession number instead of the new line , so this is what i want:

>YP_003612801.1_Firmicutes_Clostridia_Clostridiales     
MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV   
YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA 
AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

If I want to run in python does anyone know it?

python • 15k views
ADD COMMENTlink modified 2.4 years ago by jomo018570 • written 2.4 years ago by horsedog60

Your script is doing something wonky, and without looking at your script, we can't help you. Also, please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLYlink written 2.4 years ago by RamRS27k

Thanks , I just formatted it!

ADD REPLYlink written 2.4 years ago by horsedog60
3
gravatar for jomo018
2.4 years ago by
jomo018570
jomo018570 wrote:

The lines you read include the end-of-line (eol) from the input file. The print command adds its own end-of-line. So you end up with two eol hence one blank line. You can fix this using strip() on the line you read. For example line.strip() will discard eol from line.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by jomo018570
1

rstrip() should be better than strip() to avoid unwanted trimming in the head of line

ADD REPLYlink written 2.4 years ago by chen2.0k
2
gravatar for chen
2.4 years ago by
chen2.0k
OpenGene
chen2.0k wrote:

This is my guess:
1, your use readline() to get lines from the original file
2, when you use write() to write lines to the new file, you append a \n into the tail of each line

I can take a look at your code if you post it

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by chen2.0k
with open("sequence.fasta") as file:
    with open("taxonomy") as name:
        for line in taxonomy.readlines():
            for i in file.readlines():
                if i.startswith(">"):
                    print(i+"_"+line)
                else:
                    print(i)
ADD REPLYlink written 2.4 years ago by horsedog60
1

Use print with end=""

print(i+"_"+line, end="")

In addition, you can combine your with statements:

with open("sequence.fasta") as file, open("taxonomy") as name:

and your for loops:

for line, i in zip(name, file):

Your code has taxonomy.readlines(), but I assume that should be name.readlines(). There is also no reason to call .readlines() since you are simply iterating over the file. You don't need to load it entirely in memory.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by WouterDeCoster43k

then what else I should use instead of readlines()?how to read lines one by one?

ADD REPLYlink written 2.4 years ago by horsedog60

Just, like I wrote, iterate over the opened file.

for line in file:
ADD REPLYlink written 2.4 years ago by WouterDeCoster43k

OK, thank you, but still I got the query like this

>YP_003612801.1  
_Firmicutes_Clostridia_Clostridiales

and I remove readlines() already, do you think what could cause this?

ADD REPLYlink written 2.4 years ago by horsedog60

You probably need to strip the newline character off:

print(i.strip('\n')+"_"+line, end="")
ADD REPLYlink written 2.4 years ago by WouterDeCoster43k

print() will automatically add a line break in the tail

ADD REPLYlink written 2.4 years ago by chen2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour