Matching genes between two lists are return to txt file
2
0
Entering edit mode
2.9 years ago
ran • 0

Hi,

I'm pretty new to programming and I'm trying to find matching genes between young and old samples, order them in columns and write it to txt file. This is how the young gene file look like:

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83
SCYL3   4.2 4.54    5.16    5.1 4.61    4.89    5.03    4.09    4.5
C1orf112    3.24    3.03    3.9 4.29    3.58    4.96    4.03    3.6 3.72
FUCA2   3.83    NA  NA  NA  4.92    3.55    5.76    4.98    5.78
GCLC    5.31    4.66    5.18    3.94    5.25    4.43    5.75    6.56    5.69

the old one:

GENE    O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75
DUFAB1  5.3 5.36    5.29    5.37    5.37    5.53    5.57    5.36    5.39
DVL2    4.47    4.71    4.72    4.95    5.01    4.85    4.61    4.79    4.38
DYRK4   3.2 2.84    3.07    2.4 2.17    1.98    3.23    2.81    3.19

the output should be:

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9  O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75

this is my try:

with open ("youngMatrix.txt", 'r+') as young, open("oldMatrix.txt", 'r+') as old:
with open ("CombMatrix.txt", "w") as Comb_file:
    for line_old in old:
        for line_young in young:
            line_young1 = line_young.split("\t")
            line_old1 = line_old.split("\t")
            if line_old1[0] == line_young1[0]:
                edit_old1 = line_old.rstrip("\n")
                edit_young1 =line_young.rstrip("\n")
                united_file.writelines(edit_young1 + edit_old1 + "\n")

and my output is this

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9GENE  O1  O2  O3  O4  O5  O6  O7  O8  O9

I'm pretty stuck and will appreciate any help!

Python BioPython • 1.0k views
ADD COMMENT
0
Entering edit mode

Get yourself acquainted with the pandas package, and then take a look at how to perform an inner join with two pandas dataframes. That will solve your core problem. Figuring out how to write a dataframe to a text file is only a search engine query away. You got this!!

ADD REPLY
0
Entering edit mode

with tsv-utils:

$ tsv-join -H -f old.txt -k1 new.txt -a "G*,O*"

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9  GENE    O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83    DPM1    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75
ADD REPLY
1
Entering edit mode
2.9 years ago

please follow suggestions by Dunois . Here is the code.

#! /usr/bin/env python
import pandas as pd
import sys
new=sys.argv[1]
old=sys.argv[2]
new=pd.read_csv(new, sep="\t")
old=pd.read_csv(old, sep="\t")
print(pd.merge(new,old, on="GENE"))

Save it as python file. Run it with python > 3, with pandas library installed:

$ python test.py new.txt old.txt

   GENE    Y1  Y2  Y3  Y4  Y5    Y6    Y7   Y8    Y9    O1    O2    O3    O4    O5    O6    O7    O8    O9
0  DPM1  4.85 NaN NaN NaN NaN  5.35  5.52  4.6  4.83  3.92  3.84  3.98  4.06  4.16  3.84  3.88  3.96  3.75

You can also make python file executable and run it.

ADD COMMENT
0
Entering edit mode
2.9 years ago

sort both files and use join

all in one, assuming the separator is a tab:

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt) <(sort -t $'\t' -k1,1 file2.txt)  > join.txt
ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6