Question: Drop values in expression dataset python
0
gravatar for highfillchad
7 days ago by
highfillchad0 wrote:

I have this microarry dataset. I want to bypass an issue I had in the early version of this pipeline, (https://geoparse.readthedocs.io/en/latest/Analyse_hsa-miR-124a-3p_transfection_time-course.html) I have created an experiment file and read this in as a dataframe. I want to elimiated each column in my expression table that no longer exist as a string value in column accession of the dataframe I read in.

# Import tools
import GEOparse
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# download datasets
gse1 = GEOparse.get_GEO(geo="GSE99039", destdir="C:/Users/Highf_000/PycharmProjects/TFTest")
gse2 = GEOparse.get_GEO(geo="GSE6613", destdir="C:/Users/Highf_000/PycharmProjects/TFTest")
gse3 = GEOparse.get_GEO(geo="GSE72267", destdir="C:/Users/Highf_000/PycharmProjects/TFTest")

# import all GSM data for each GSE file
with open("GSE99039_GPL570.csv") as f:
    GSE99039_GPL570 = f.read().splitlines()
with open("GSE6613_GPL96.csv") as f:
    GSE6613_GPL96 = f.read().splitlines()
with open("GSE72267_GPL571.csv") as f:
    GSE72267_GPL571 = f.read().splitlines()

# gse1
gse1.gsm = gse1.phenotype_data
print(gse1.gsm.head())

# gse1
gse1.details = pd.read_csv('GSE99039_MicroarrayDetails.csv', delimiter = ',')
print(gse1.details.head())
gse1.detailsv1 = gse1.details[(gse1.details.values == "CONTROL") | (gse1.details.values == "IPD") | (gse1.details.values == "GPD") ]
print(gse1.detailsv1.head())

# gse1
pivoted_control_samples = gse1.pivot_samples('VALUE')[GSE99039_GPL570]
print(pivoted_control_samples)


# gse1
# Pulls the probes out
pivoted_control_samples_average = pivoted_control_samples.median(axis=1)
# Print number of probes before filtering
print("Number of probes before filtering: ", len(pivoted_control_samples_average))
# Extract all probes > 0.25
expression_threshold = pivoted_control_samples_average.quantile(0.25)
expressed_probes = pivoted_control_samples_average[pivoted_control_samples_average >= expression_threshold].index.tolist()
# Print probes above cut off
print("Number of probes above threshold: ", len(expressed_probes))
# confirm filtering worked
samples = gse1.pivot_samples("VALUE").loc[expressed_probes]
print(samples.head())

# print phenotype data
print(gse1.phenotype_data[["title", "source_name_ch1", "Disease_Label", "Sex" ]])

This is what my dataframe I created looks like, named gse1.detailsv1 in script:

   Accession       Title  Source name  ... Subject_id Disease label     Sex
0  GSM2630758  E7R_039a01  Whole blood  ...      L3012       CONTROL  Female
1  GSM2630759  E7R_039a02  Whole blood  ...      L2838           IPD    Male
2  GSM2630760  E7R_039a03  Whole blood  ...      L2540           IPD  Female
3  GSM2630761  E7R_039a04  Whole blood  ...      L3015       CONTROL  Female
4  GSM2630762  E7R_039a05  Whole blood  ...      L2884           IPD  Female

[5 rows x 7 columns]

This is what my expression table looks like, named samples in script:

name       GSM2630758  GSM2630759  ...  GSM2631314  GSM2631315
ID_REF                             ...                        
1007_s_at       5.397       4.952  ...       5.567       5.529
1053_at         5.199       5.198  ...       5.706       5.078
117_at          8.327       8.589  ...       8.511       8.458
121_at          7.042       6.935  ...       7.526       7.673
1294_at         7.753       8.210  ...       7.537       7.418

[5 rows x 558 columns]

For pretend, if GSM2630758 doesnt exist in column Accession in the first dataframe, I want to drop GSM2630758. I need loop through this and eliminate all values that no longer exist.

rna-seq geoparse python geo • 68 views
ADD COMMENTlink modified 5 days ago by zorbax100 • written 7 days ago by highfillchad0
1
gravatar for zorbax
5 days ago by
zorbax100
Mexico
zorbax100 wrote:

Cross-posted here https://stackoverflow.com/questions/61890315/merge-probes-and-gene-ids-with-geoparse

Answered here: https://stackoverflow.com/questions/61907186/drop-values-in-expression-dataset-python

ADD COMMENTlink written 5 days ago by zorbax100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1231 users visited in the last hour