Find similar values in two gene lists
3
0
Entering edit mode
4.7 years ago
Nicky • 0

Hi guys, I have two very large gene data sets, and I want to extract all the matching values that are in the two lists.

but I haven't been successful until now,

so far this is my code

                  list1 = ("1_10.txt")
                  list2 = ("1_10.txt")

   ID <- match(list1,list2) 
          result1 <- list2[na.omit(ID)]
                   unique(result1)
                      write.csv(ID,file="matchedresults1.txt")




   list1 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141337
         ENSG00000154257

     list2 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141338
         ENSG00000154258

So I expected that see the extracted data: 

     list3 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863

thanks for reading

R coding • 1.2k views
ADD COMMENT
0
Entering edit mode

intersect()

ADD REPLY
0
Entering edit mode

Hi there, I have a long list of genes in two files.

Let me try your script

ADD REPLY
0
Entering edit mode

Nicky : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY
0
Entering edit mode
4.7 years ago
piyushjo ▴ 700

Try this

list1<-read.delim("1_10.txt")
list2<-read.delim("1_10.txt")

Here both of your lists are exactly same. Do you have long list of genes in two files or is it a matrix? The following will work for a list with one column. First convert factors into characters array

list1<-as.character(list1) 
#If the read.delim makes list1 as a data.frame with 1 column, you would need as.character(list1$X), where X is the heading of column, or x if there was no heading. 
list2<-as.character(list2)    
keep<- list1%in%list2 
sel<- list1[keep] 
sel<-sel[!duplicated(sel)]
write.csv(sel,"matchedresults1.txt")
ADD COMMENT
0
Entering edit mode

Hi there, I have a long list of genes in two files.

I with your script I got the following result

                                      "","x"
                                      "1","1_10.txt"

no idea what does it mean :(

ADD REPLY
0
Entering edit mode

You first need to have files with those name in the folder. list1 and list2 are the two lists that have given in your example.

ADD REPLY
0
Entering edit mode
4.7 years ago
2nelly ▴ 310

You can try this

awk 'NR==FNR {end[$1]; next} ($1 in end)' list1 list2

In case you want to match different columns change the first $1 with the number of column you want to compare or the second $1 to match with another column

ADD COMMENT

Login before adding your answer.

Traffic: 3146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6