Question: Find similar values in two gene lists
0
gravatar for Nicky
9 months ago by
Nicky0
Nicky0 wrote:

Hi guys, I have two very large gene data sets, and I want to extract all the matching values that are in the two lists.

but I haven't been successful until now,

so far this is my code

                  list1 = ("1_10.txt")
                  list2 = ("1_10.txt")

   ID <- match(list1,list2) 
          result1 <- list2[na.omit(ID)]
                   unique(result1)
                      write.csv(ID,file="matchedresults1.txt")




   list1 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141337
         ENSG00000154257

     list2 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141338
         ENSG00000154258

So I expected that see the extracted data: 

     list3 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863

thanks for reading

coding R • 222 views
ADD COMMENTlink modified 9 months ago by 2nelly180 • written 9 months ago by Nicky0

intersect()

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint34k

Hi there, I have a long list of genes in two files.

Let me try your script

ADD REPLYlink written 9 months ago by Nicky0

Nicky : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax83k
0
gravatar for piyushjo
9 months ago by
piyushjo470
piyushjo470 wrote:

Try this

list1<-read.delim("1_10.txt")
list2<-read.delim("1_10.txt")

Here both of your lists are exactly same. Do you have long list of genes in two files or is it a matrix? The following will work for a list with one column. First convert factors into characters array

list1<-as.character(list1) 
#If the read.delim makes list1 as a data.frame with 1 column, you would need as.character(list1$X), where X is the heading of column, or x if there was no heading. 
list2<-as.character(list2)    
keep<- list1%in%list2 
sel<- list1[keep] 
sel<-sel[!duplicated(sel)]
write.csv(sel,"matchedresults1.txt")
ADD COMMENTlink modified 9 months ago • written 9 months ago by piyushjo470

Hi there, I have a long list of genes in two files.

I with your script I got the following result

                                      "","x"
                                      "1","1_10.txt"

no idea what does it mean :(

ADD REPLYlink written 9 months ago by Nicky0

You first need to have files with those name in the folder. list1 and list2 are the two lists that have given in your example.

ADD REPLYlink written 9 months ago by piyushjo470
0
gravatar for 2nelly
9 months ago by
2nelly180
Geneva,Switzerland
2nelly180 wrote:

You can try this

awk 'NR==FNR {end[$1]; next} ($1 in end)' list1 list2

In case you want to match different columns change the first $1 with the number of column you want to compare or the second $1 to match with another column

ADD COMMENTlink written 9 months ago by 2nelly180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 895 users visited in the last hour