Question: Find similar values in two gene lists
0
gravatar for Nicky
7 days ago by
Nicky0
Nicky0 wrote:

Hi guys, I have two very large gene data sets, and I want to extract all the matching values that are in the two lists.

but I haven't been successful until now,

so far this is my code

                  list1 = ("1_10.txt")
                  list2 = ("1_10.txt")

   ID <- match(list1,list2) 
          result1 <- list2[na.omit(ID)]
                   unique(result1)
                      write.csv(ID,file="matchedresults1.txt")




   list1 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141337
         ENSG00000154257

     list2 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863
         ENSG00000141338
         ENSG00000154258

So I expected that see the extracted data: 

     list3 
        EnsemblGeneID
         ENSG00000109573
         ENSG00000205003
         ENSG00000124603
         ENSG00000008313
         ENSG00000183043
         ENSG00000179863

thanks for reading

coding R • 91 views
ADD COMMENTlink modified 7 days ago by 2nelly170 • written 7 days ago by Nicky0

intersect()

ADD REPLYlink modified 7 days ago • written 7 days ago by ATpoint21k

Hi there, I have a long list of genes in two files.

Let me try your script

ADD REPLYlink written 7 days ago by Nicky0

Nicky : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink modified 7 days ago • written 7 days ago by genomax70k
0
gravatar for piyushjo
7 days ago by
piyushjo140
piyushjo140 wrote:

Try this

list1<-read.delim("1_10.txt")
list2<-read.delim("1_10.txt")

Here both of your lists are exactly same. Do you have long list of genes in two files or is it a matrix? The following will work for a list with one column. First convert factors into characters array

list1<-as.character(list1) 
#If the read.delim makes list1 as a data.frame with 1 column, you would need as.character(list1$X), where X is the heading of column, or x if there was no heading. 
list2<-as.character(list2)    
keep<- list1%in%list2 
sel<- list1[keep] 
sel<-sel[!duplicated(sel)]
write.csv(sel,"matchedresults1.txt")
ADD COMMENTlink modified 7 days ago • written 7 days ago by piyushjo140

Hi there, I have a long list of genes in two files.

I with your script I got the following result

                                      "","x"
                                      "1","1_10.txt"

no idea what does it mean :(

ADD REPLYlink written 7 days ago by Nicky0

You first need to have files with those name in the folder. list1 and list2 are the two lists that have given in your example.

ADD REPLYlink written 7 days ago by piyushjo140
0
gravatar for 2nelly
7 days ago by
2nelly170
Geneva,Switzerland
2nelly170 wrote:

You can try this

awk 'NR==FNR {end[$1]; next} ($1 in end)' list1 list2

In case you want to match different columns change the first $1 with the number of column you want to compare or the second $1 to match with another column

ADD COMMENTlink written 7 days ago by 2nelly170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1189 users visited in the last hour