Question

How to delete spaces in rows from csv file in Python

0

Entering edit mode

18 months ago

Paula ▴ 60

Hi! I have a csv file and I need to format it. The input csv file looks like this:

old_name,new_name
"NODE_1_length_592822_cov_338.586386
", SOL_1_3_cov_338.586386_N_1
"NODE_1_length_592822_cov_338.586386
",SOL_1_3_cov_338.586386_N_2

And the final result should look like this:

old_name,new_name
NODE_1_length_592822_cov_338.586386, SOL_1_3_cov_338.586386_N_1
NODE_1_length_592822_cov_338.586386,SOL_1_3_cov_338.586386_N_2

I have tried multiple strategies but none of them has given the desired result:

with open('file.csv','r') as f:
content = f.readlines()
cleaned = ''
for line in content:
    if line != '\n':
        cleaned += line
print(cleaned.replace(" ",""))

Another one

text = open("file.csv", "r", encoding="utf8")
    text = ''.join([i for i in text]) \
        .replace("  ", "")
    x = open("file1" + i + ".csv", "w", encoding="utf8")
    x.writelines(text)
    x.close()

And another one

import csv
with open('file.csv', newline='') as in_file:
with open('new_file.csv', 'w', newline='') as out_file:
    writer = csv.writer('new_file.csv')
    for row in csv.reader('file.csv'):
        if row:
            writer.writerow(row)

Do you have any ideas as to how to solve it?

Thanks!

python csv • 3.8k views

ADD COMMENT • link updated 18 months ago by Andrzej Zielezinski 11k • written 18 months ago by Paula ▴ 60

0

Entering edit mode

It would be easier to read the whole file into memory.

fh = open('file.csv')
content = fh.read()
fh.close()

content = content.replace(' ', '').replace('"', '').replace('\n,', ',')    
print(content)

Output:

old_name,new_name
NODE_1_length_592822_cov_338.586386,SOL_1_3_cov_338.586386_N_1
NODE_1_length_592822_cov_338.586386,SOL_1_3_cov_338.586386_N_2

ADD REPLY • link 18 months ago by Andrzej Zielezinski 11k

0

Entering edit mode

Thanks Andrzej! I'd like to ask you one additional question if you don't mind. If I want to keep the spaces in words in the first column. For example:

I want to obtain

NODE_1_length_592822_cov_338.586386_4 # 1409 # 3598

Instead of:

NODE_1_length_592822_cov_338.586386_4#1409#3598

How can I modify the script?

Thanks a lot!

ADD REPLY • link updated 18 months ago by GenoMax 141k • written 18 months ago by Paula ▴ 60

0

Entering edit mode

Could you show me the first few lines of the csv?

ADD REPLY • link 18 months ago by Andrzej Zielezinski 11k

0

Entering edit mode

Sure!

old_name,new_name
"NODE_1_length_592822_cov_338.586386_1 # 2 # 169 # -1 # 
",SOL_1_3_cov_338.586386_N_1
"NODE_1_length_592822_cov_338.586386_2 # 417 # 695 # 1 # 
",SOL_1_3_cov_338.586386_N_2

Thanks!

ADD REPLY • link 18 months ago by Paula ▴ 60

0

Entering edit mode

Please do not add any " or other exraneous characters to data examples. Use the 101010 button to format the data you want to show in proper format.

ADD REPLY • link 18 months ago by GenoMax 141k

0

Entering edit mode

Thanks GenoMax!

ADD REPLY • link 18 months ago by Paula ▴ 60

score 3 · Accepted Answer · 2022-10-16

3

Entering edit mode

18 months ago

Andrzej Zielezinski 11k

This should do the job:

oh = open('new_file.csv', 'w')

with open('file.csv') as fh:
    line = fh.readline()
    oh.write(line)
    lst = []
    for i, line in enumerate(fh):
        line = line.strip().lstrip('"').lstrip(',')
        if i % 2:
            lst.append(line)
            oh.write(f'{lst[0]},{lst[1]}\n')
            lst = []
        else:
            lst.append(line)

oh.close()