extract gene id form excel and save into new files
2
0
Entering edit mode
8.3 years ago
Eva_Maria ▴ 190

I have a excel file like this

GCF_000582905.1.fna    GCF_000878815.1.fna    GCF_000951795.1.fna
dnaE_11                dnaE_21                dnaE_3
dtdS_35                dtdS_23                dtdS_4
gyrB_4                 gyrB_15                gyrB_4
pntA_29                pntA_23                pntA_29
pyrC_52                pyrC_21                pyrC_4
recA_16                recA_1                 recA_19
tnaA_22                tnaA_16                tnaA_22

I want output as like as a filename GCF_000974905.1.fna which contains

dnaE_11
dtdS_35
gyrB_4
pntA_29
pyrC_52
recA_16
tnaA_22

like this

Please help me

gene perl • 1.8k views
ADD COMMENT
2
Entering edit mode

It is a very bad way to use Excel for storing gene names. As it is discussed here and here that how gene names can automatically get converted to dates.

ADD REPLY
0
Entering edit mode

^This. Please DO NOT USE Excel to work with gene names. God forbid you stumble across names like MAR1.

ADD REPLY
0
Entering edit mode

I'd say do not use Excel for scientific purposes. You just can't guarantee data integrity with it (gene names are just one example). Unfortunately so many labs (and companies) are addicted to it.

ADD REPLY
1
Entering edit mode

use save as in Excel??

ADD REPLY
0
Entering edit mode

save as text file

ADD REPLY
0
Entering edit mode

This is fine if you only have one or two files but becomes painful when you have dozens of files, especially when they have multiple sheets.

ADD REPLY
3
Entering edit mode
8.3 years ago

The Gnumeric package comes with a neat command line utility called ssconvert that can convert Excel files to plain text formats. Then use your favorite scripting language to reorganize the data. There's also the gnumeric R package (which requires ssconvert) to get the file content directly in R.

ADD COMMENT
1
Entering edit mode
8.3 years ago
Tej Sowpati ▴ 250

Adding to the already good answer by Jean-Karim, you can do the following if you have pandas (Python):

from __future__ import print_function  # Delete this line if using python 3
import pandas as pd

data = pd.read_excel('171477.xlsx')    # Change file name to your Excel file

for column in data:
    with open(column, 'w') as file_out:
        print(data[column].to_csv(file_out, index=False))

Save the above script in a file (for example, myscript.py), and then run it like so:

$ python myscript.py

If you don't have pandas (You will get ImportError), you can install it using pip:

$ pip install pandas

Cheers,
TEJ

ADD COMMENT

Login before adding your answer.

Traffic: 2787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6