extracting a file from a bigger file
1
0
Entering edit mode
6.7 years ago
zizigolu ★ 4.3k

Hi,

I have two files

TF      gene    interaction
AATF    BAK1    Unknown
AATF    BAX     Repression
AATF    MYC     Activation

And:

TF      gene
KAT5    KLRC4-KLRK1
EGF     ABCB6
ETS1    CDKN2A

how I can extract the small file from the bigger one that contains interaction types???

I tried join, intersect, match but I can only do for gene or TF not for both simultaneously

by the below code only TFs are common between two files

df3 <- merge( dat1, dat2, by.x = "TF", by.y = "gene" )
R software error • 1.4k views
ADD COMMENT
1
Entering edit mode

There are no common entities (genes or TF) between two dfs here (dat1 and dat2) in the above example. Output would be empty

ADD REPLY
0
Entering edit mode

but there is common TFs and genes I checked with venn diagram :(

ADD REPLY
1
Entering edit mode

Are there common genes/symbols between "genes" column of dat1 and "TF" of dat2?

ADD REPLY
1
Entering edit mode

Created an example dataset from above lists:(dat1 is list1 and dat2 is list2). Since there are no common entities between list1 and list2, I copy pasted very first line to list2 (dat2) from list1 (dat1).

Following is the code:

>library(dplyr)

> list1
    TF gene interaction
1 AATF BAK1     Unknown
2 AATF  BAX  Repression
3 AATF  MYC  Activation

> list2
    TF        gene
1  EGF       ABCB6
2 ETS1      CDKN2A
3 KAT5 KLRC4-KLRK1
4 AATF        BAK1

Find out the common entities between the files matching both gene and TF.

> inner_join(list1,list2)
Joining, by = c("TF", "gene")
    TF gene interaction
1 AATF BAK1     Unknown
ADD REPLY
0
Entering edit mode

yes there are common TFs between genes too

ADD REPLY
1
Entering edit mode

You are joining dat1 (df1) and dat2 (df2) not by same columns (i.e genes with genes, TF with TF), but by different columns (genes from df2 with TF from df1). Is that intentional?

ADD REPLY
0
Entering edit mode

exactly this is my mean, thank you for your kindly efforts, I will try your solutions

I want to extract intractions from my bigger file that contains common genes and TF existed in my small file.

ADD REPLY
1
Entering edit mode

I guess there is a small confusion here. From the command you have posted here, it is to my understanding that you are supposed to match between genes of one file and tf of another file. But from your above comment, what you want to do is different. There are two files. One small and one big. Both the files have gene and TF list. You just want to extract information from big file (probably with interaction term) using entries from small file and you want to use both Genes and TF.

Which one is correct? If it is the later one, then code is different and updated.

ADD REPLY
0
Entering edit mode

exactly your second assumption is correct that Ram's solution game me this

https://ibb.co/ffiPUv

thank you both. when I merge first for TFs and then for genes one by one I got different interactions

ADD REPLY
1
Entering edit mode
6.7 years ago
Ram 43k

You wish to use 2 "columns" to join your dataset, correct? Given that your datasets have identical colnames for those columns, substitute your by.x="TF",by.y="gene" with by=c("TF","gene") (which is a short version of by.x=c("TF","gene"),by.y=c("TF","gene")) - this will merge using matching values from both columns.

ADD COMMENT

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6