Question: Split Data From A Csv Cell Into Smaller Cells
0
gravatar for RossCampbell
9.4 years ago by
RossCampbell140
USA/Frederick
RossCampbell140 wrote:

I have a csv file containing matrices of nucleotide frequencies for each position in an alignment. Essentially, I have a PWM saved as a csv file. Unfortunately, my script saved each set of frequencies in one cell. For example, in cell B1, I have {A: 0.25, C: 0.25, G: 0.25, T: 0.25}. What I need is for each cell to be more like: B1; {A:0.25} B2; {C:0.25} B3; {G:0.25} B4; {T:0.25}. Is there a way to split each frequency into its own cell like this?

The python code that I used to write to the csv:

alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)
writer = csv.writer(open('filename.csv', 'wb', buffering=0))
for seq_record in SeqIO.parse("filename.fasta", "fasta", alphabet=alphabet):
    m.add_instance(seq_record.seq)
    PWM = m.pwm()
    writer.writerows([[seq_record.id)],(PWM)])
python format parsing • 3.8k views
ADD COMMENTlink modified 9.4 years ago by Istvan Albert ♦♦ 86k • written 9.4 years ago by RossCampbell140
4

Could it be that you open the resulting csv file in Excel but don't specify comma as the delimiter?- otherwise "A: 0.25, C: 0.25, G: 0.25, T: 0.25" should be split in different cells, as there is a comma there.

ADD REPLYlink written 9.4 years ago by Michael Schubert7.0k
1

you can add an extra step here. when you open the file in excel, select all the data, then in the "data" tab, click on "convert text to columns" and chose "comma" as delimiter. This should split the data in 4 different cells.

ADD REPLYlink written 9.4 years ago by Gjain5.5k

Great idea. I'll try that and see what happens.

ADD REPLYlink written 9.4 years ago by RossCampbell140

Oddly enough, specifying a comma as the delimiter didn't affect the format. Changing it to ' ' or '}' (both commonly occurring in my file) did though. I'm not really sure why the comma didn't.

ADD REPLYlink written 9.4 years ago by RossCampbell140
1
gravatar for Istvan Albert
9.4 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

What you will need to do is to format the data structure that you get out from the pwm() method. You should also use the writerow() method rather than the writerows() one. Finally you might want to impose an order rather than relying on a default dictionary order.

Your code then will look approximately like this (I don't have a similar code ready so I can't test it):

row = [ sequenceid ]
for letter in "ATGC":
    row.append( pwmdata[letter] )
writer.writerow( row )

where pwmdata is a dictionary like object that holds your frequencies.

ADD COMMENTlink written 9.4 years ago by Istvan Albert ♦♦ 86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2280 users visited in the last hour
_