Question: Exctracting a list of genes by keeping the order of them
1
gravatar for Za
2.3 years ago by
Za130
Za130 wrote:

Hi,

I have these lists of genes

> dim(b)
[1] 4866    2
>

How I can extract the small list (bb) from big on by the same order of genes???? I mean the order of genes from extracted file should be the same with small file(bb).

rna-seq R • 514 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Za130

try b[b$gene %in% bb$gene,]

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011214k

Thank you, but I need the order. I mean, for example for gene DDB_G0292120 in bb I want to know the index based on the file b. intersection does not give me the indices based on the order of genes. I have a heat map of bb genes and I want to know their indices based on the file b

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Za130
3
gravatar for cpad0112
2.3 years ago by
cpad011214k
India
cpad011214k wrote:

b and bb in OP doesn't share any gene. What is being asked, is not possible when values do not exist (between two data sets). Look at the following example and see if this is what is required:

> b=data.frame(genes=paste("gene", sample(10), sep="_"), expn=round(rnorm(10,1,4),2))
> bb=data.frame(genes=paste("gene",sample(5), sep="_"))
> b
     genes  expn
1   gene_6 -0.78
2  gene_10  4.65
3   gene_9  1.86
4   gene_3  2.39
5   gene_1  0.34
6   gene_2  2.01
7   gene_8 -4.51
8   gene_7 -7.61
9   gene_5  1.50
10  gene_4  3.12

> bb
   genes
1 gene_2
2 gene_1
3 gene_3
4 gene_4
5 gene_5

> b[match(bb$genes,b$genes),]
    genes expn
6  gene_2 2.01
5  gene_1 0.34
4  gene_3 2.39
10 gene_4 3.12
9  gene_5 1.50
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by cpad011214k

I checked in venn diagram, there are many genes common between files

ADD REPLYlink written 2.3 years ago by Za130

Thank you, actually your code worked

ADD REPLYlink written 2.3 years ago by Za130

Did you try other solutions posted here?

ADD REPLYlink written 2.3 years ago by cpad011214k

Actually not yet, even I just tried your code on a toy data. Today I am going to try these codes

Thanks a lot both you and zx8754

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Za130
1
gravatar for zx8754
2.3 years ago by
zx87549.6k
London
zx87549.6k wrote:

Add row number to b then merge, and finally re-order merged dataframe based on our row number, see this example:

# example input data
b <- read.table(text = "
gene index
DDB_G0295603 0.9922432
DDB_G0295719 0.9917077
DDB_G0292120 0.3333333
DDB_G0282307 0.9876919
DDB_G0269672 0.9862853
DDB_G0269462 0.6666666
DDB_G0284895 0.9853162
DDB_G0274031 0.9803622", header = TRUE, stringsAsFactors = FALSE)

bb <- read.table(text = "
gene
DDB_G0292120
DDB_G0278649
DDB_G0288947
DDB_G0269462
DDB_G0278757
DDB_G0281793", header = TRUE, stringsAsFactors = FALSE)

# add row number
b$myOrder <- seq(nrow(b))

# then merge to get "index" and "myOrder" columns
res <- merge(bb, b, by = "gene")

# and reorder the merged dataframe
res <- res[ order(res$myOrder), ]

res
#           gene     index myOrder
# 2 DDB_G0292120 0.3333333       3
# 1 DDB_G0269462 0.6666666       6
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by zx87549.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1038 users visited in the last hour