How to use python to process .CEL file?
0
1
Entering edit mode
9.6 years ago
zero_hsy ▴ 110

Hello:

I am processing the raw data on the CMAP where there are a lot of row data using the .CEL format. I know I can read the value using the R software. But ,what I want is to use python to process .CEL data. I have learned that there is a package called biopython which can process CEL data. Could anyone know the detail of how to process .CEL data using python? The following is my code to process .CEL data using python. But there is something wrong.

from Bio.Affy import CelFile
with open('AGENT_p_NCLE_RNA6_HG-U133_Plus_2_A01_436578.CEL') as handle:
    c = CelFile.read(handle)

print c
print(c.ncols, c.nrows)

The result is as the followings:

<Bio.Affy.CelFile.Record object at 0x02534730>
(None, None)

What is the wrong with my code? And using R, the CDF is used however in python it is not used,why?

It would be nice of you to answer my problem.

cel affymetrix python R cmap • 10.0k views
ADD COMMENT
0
Entering edit mode

Your code looks fine but are you sure the file exists in the same directory and has some contents ?

ADD REPLY
0
Entering edit mode

I am sure they are in the same directory and the .CEL data have contents and can be run by R

ADD REPLY
0
Entering edit mode

Can you try the same code on the CEL file given in the BioPython repo?

https://github.com/biopython/biopython/tree/master/Tests/Affy

Here the is download link: affy_v3_example.CEL

ADD REPLY
0
Entering edit mode

When I downloaded the data affy_v3_example.CEL

with open('affy_v3_example.CEL') as handle:
    c = CelFile.read(handle)

print c
print(c.ncols, c.nrows)
print(c.intensities)

The result is as follows:

(5, 5)
[[   234.    170.  22177.    164.  22104.]
 [   188.    188.  21871.    168.  21883.]
 [   188.    193.  21455.    198.  21300.]
 [   188.    182.  21438.    188.  20945.]
 [   193.  20370.    174.  20605.    168.]]

It is fine,but what is the problem with my data?

ADD REPLY
0
Entering edit mode

And I have open affy_v3_example.CEL, It is the data that is processed, I think. Because the CEL data is raw data about the probe set. And My cel data is messy code.

ADD REPLY
0
Entering edit mode

Also I am confessed with the method python used, since .cel contained a lot of probes which means that it should need CDF. And this is done right using R. However, in python it does not matter CDF. How can it done?

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6