Python table data extraction
0
0
Entering edit mode
20 months ago

Hi!

I'm working a table like this:

enter image description here

and I need to access each one of the "GenBank Accessions" number, compare it to the second table "Info".

enter image description here

Then, for each row in the first table I want to add the count of the specific species. For example, in the first row of the table one I would have

Enterovirus A: 4 
Enterovirus G: 2

How can I do that?

Thanks a lot!

python • 858 views
ADD COMMENT
0
Entering edit mode

What have you tried? Have you Googled anything?

ADD REPLY
0
Entering edit mode

Hi!

I'm trying for-loops, but only the first element of each row is exactly compared to the second table.

Maybe I need to do the opposite, it might be easier...

import pandas as pd
import time

file = pd.read_csv('full_DB.csv')

info = pd.read_csv('info.csv')

file['Sample'] = file['Sample'].str.replace('accn\|', '')
file.rename(columns={'Sample':'GenBank Accessions'}, inplace=True)

info_sub = info[['Species','GenBank Accessions']]

lista = info['Species'].unique().tolist()

for i in lista:
    file[i] = ""

subset = file.iloc[0:5,:]

for row in subset.index:
    t = subset['GenBank Accessions'][row]
    arr = t.split(',')
    for j in range(len(arr)):
        print(info_sub[info_sub['GenBank Accessions'] == arr[j]])
ADD REPLY
0
Entering edit mode

I've undeleted your question as it has already received some feedback from a couple of users.

Your description of the data is minimal and confusing, so we can't really help with the code. Can you explain what you wish to achieve in plain terms such as :

  1. Split column X from table Y using "," as the delimiter
  2. Match each value in the split column to column Z from table YY

etc.

ADD REPLY
0
Entering edit mode

Are these real tables or CVS data files? You can use Python pandas library for this data merge. Create a dataframe for each table, loop through them, position the required value in table 1 and find the other value in table 2, and so on. At end build a new CSV file with the found desired values. In Machine Learning this is call data preprocessing or Exploratory Data Analysis (EDA). We do that all the time in any Machine Learning project. I hope this clarifies your task.

ADD REPLY
0
Entering edit mode

This does not answer OP's question - it's just a detour into how the genre of OP's question is common among ML projects. As such, I've moved it to a comment.

Data cleaning is common in any data analysis project, not just ML projects. Please stop pushing ML everywhere.

ADD REPLY

Login before adding your answer.

Traffic: 2753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6