Question

Matching genes between two lists are return to txt file

0

Entering edit mode

2.9 years ago

ran • 0

Hi,

I'm pretty new to programming and I'm trying to find matching genes between young and old samples, order them in columns and write it to txt file. This is how the young gene file look like:

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83
SCYL3   4.2 4.54    5.16    5.1 4.61    4.89    5.03    4.09    4.5
C1orf112    3.24    3.03    3.9 4.29    3.58    4.96    4.03    3.6 3.72
FUCA2   3.83    NA  NA  NA  4.92    3.55    5.76    4.98    5.78
GCLC    5.31    4.66    5.18    3.94    5.25    4.43    5.75    6.56    5.69

the old one:

GENE    O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75
DUFAB1  5.3 5.36    5.29    5.37    5.37    5.53    5.57    5.36    5.39
DVL2    4.47    4.71    4.72    4.95    5.01    4.85    4.61    4.79    4.38
DYRK4   3.2 2.84    3.07    2.4 2.17    1.98    3.23    2.81    3.19

the output should be:

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9  O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75

this is my try:

with open ("youngMatrix.txt", 'r+') as young, open("oldMatrix.txt", 'r+') as old:
with open ("CombMatrix.txt", "w") as Comb_file:
    for line_old in old:
        for line_young in young:
            line_young1 = line_young.split("\t")
            line_old1 = line_old.split("\t")
            if line_old1[0] == line_young1[0]:
                edit_old1 = line_old.rstrip("\n")
                edit_young1 =line_young.rstrip("\n")
                united_file.writelines(edit_young1 + edit_old1 + "\n")

and my output is this

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9GENE  O1  O2  O3  O4  O5  O6  O7  O8  O9

I'm pretty stuck and will appreciate any help!

Python BioPython • 1.0k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 2.9 years ago by ran • 0

0

Entering edit mode

Get yourself acquainted with the pandas package, and then take a look at how to perform an inner join with two pandas dataframes. That will solve your core problem. Figuring out how to write a dataframe to a text file is only a search engine query away. You got this!!

ADD REPLY • link 2.9 years ago by Dunois ★ 2.5k

0

Entering edit mode

with tsv-utils:

$ tsv-join -H -f old.txt -k1 new.txt -a "G*,O*"

GENE    Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9  GENE    O1  O2  O3  O4  O5  O6  O7  O8  O9
DPM1    4.85    NA  NA  NA  NA  5.35    5.52    4.6 4.83    DPM1    3.92    3.84    3.98    4.06    4.16    3.84    3.88    3.96    3.75

ADD REPLY • link 2.9 years ago by cpad0112 21k

score 1 · Answer 1 · 2021-05-18

please follow suggestions by Dunois . Here is the code.

#! /usr/bin/env python
import pandas as pd
import sys
new=sys.argv[1]
old=sys.argv[2]
new=pd.read_csv(new, sep="\t")
old=pd.read_csv(old, sep="\t")
print(pd.merge(new,old, on="GENE"))

Save it as python file. Run it with python > 3, with pandas library installed:

$ python test.py new.txt old.txt

   GENE    Y1  Y2  Y3  Y4  Y5    Y6    Y7   Y8    Y9    O1    O2    O3    O4    O5    O6    O7    O8    O9
0  DPM1  4.85 NaN NaN NaN NaN  5.35  5.52  4.6  4.83  3.92  3.84  3.98  4.06  4.16  3.84  3.88  3.96  3.75

You can also make python file executable and run it.

score 0 · Answer 2 · 2021-05-17

0

Entering edit mode

2.9 years ago

Pierre Lindenbaum 161k

sort both files and use join

all in one, assuming the separator is a tab:

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt) <(sort -t $'\t' -k1,1 file2.txt)  > join.txt

ADD COMMENT • link 2.9 years ago by Pierre Lindenbaum 161k