Question

Exctracting a list of genes by keeping the order of them

1

Entering edit mode

5.9 years ago

Za ▴ 140

Hi,

I have these lists of genes

> dim(b)
[1] 4866    2
>

How I can extract the small list (bb) from big on by the same order of genes???? I mean the order of genes from extracted file should be the same with small file(bb).

RNA-Seq R • 1.1k views

ADD COMMENT • link 5.9 years ago by Za ▴ 140

0

Entering edit mode

try b[b$gene %in% bb$gene,]

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Thank you, but I need the order. I mean, for example for gene DDB_G0292120 in bb I want to know the index based on the file b. intersection does not give me the indices based on the order of genes. I have a heat map of bb genes and I want to know their indices based on the file b

ADD REPLY • link 5.9 years ago by Za ▴ 140

1

Entering edit mode

5.9 years ago

zx8754 11k

Add row number to b then merge, and finally re-order merged dataframe based on our row number, see this example:

# example input data
b <- read.table(text = "
gene index
DDB_G0295603 0.9922432
DDB_G0295719 0.9917077
DDB_G0292120 0.3333333
DDB_G0282307 0.9876919
DDB_G0269672 0.9862853
DDB_G0269462 0.6666666
DDB_G0284895 0.9853162
DDB_G0274031 0.9803622", header = TRUE, stringsAsFactors = FALSE)

bb <- read.table(text = "
gene
DDB_G0292120
DDB_G0278649
DDB_G0288947
DDB_G0269462
DDB_G0278757
DDB_G0281793", header = TRUE, stringsAsFactors = FALSE)

# add row number
b$myOrder <- seq(nrow(b))

# then merge to get "index" and "myOrder" columns
res <- merge(bb, b, by = "gene")

# and reorder the merged dataframe
res <- res[ order(res$myOrder), ]

res
#           gene     index myOrder
# 2 DDB_G0292120 0.3333333       3
# 1 DDB_G0269462 0.6666666       6

ADD COMMENT • link 5.9 years ago by zx8754 11k

score 3 · Accepted Answer · 2018-06-04

3

Entering edit mode

5.9 years ago

cpad0112 21k

b and bb in OP doesn't share any gene. What is being asked, is not possible when values do not exist (between two data sets). Look at the following example and see if this is what is required:

> b=data.frame(genes=paste("gene", sample(10), sep="_"), expn=round(rnorm(10,1,4),2))
> bb=data.frame(genes=paste("gene",sample(5), sep="_"))
> b
     genes  expn
1   gene_6 -0.78
2  gene_10  4.65
3   gene_9  1.86
4   gene_3  2.39
5   gene_1  0.34
6   gene_2  2.01
7   gene_8 -4.51
8   gene_7 -7.61
9   gene_5  1.50
10  gene_4  3.12

> bb
   genes
1 gene_2
2 gene_1
3 gene_3
4 gene_4
5 gene_5

> b[match(bb$genes,b$genes),]
    genes expn
6  gene_2 2.01
5  gene_1 0.34
4  gene_3 2.39
10 gene_4 3.12
9  gene_5 1.50

ADD COMMENT • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

I checked in venn diagram, there are many genes common between files

ADD REPLY • link 5.9 years ago by Za ▴ 140

0

Entering edit mode

Thank you, actually your code worked

ADD REPLY • link 5.9 years ago by Za ▴ 140

0

Entering edit mode

Did you try other solutions posted here?

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Actually not yet, even I just tried your code on a toy data. Today I am going to try these codes

Thanks a lot both you and zx8754

ADD REPLY • link 5.9 years ago by Za ▴ 140