Question: Exctracting a list of genes by keeping the order of them
1
gravatar for Za
22 months ago by
Za120
Za120 wrote:

Hi,

I have these lists of genes

> dim(b)
[1] 4866    2
>

How I can extract the small list (bb) from big on by the same order of genes???? I mean the order of genes from extracted file should be the same with small file(bb).

rna-seq R • 452 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by Za120

try b[b$gene %in% bb$gene,]

ADD REPLYlink modified 22 months ago • written 22 months ago by cpad011212k

Thank you, but I need the order. I mean, for example for gene DDB_G0292120 in bb I want to know the index based on the file b. intersection does not give me the indices based on the order of genes. I have a heat map of bb genes and I want to know their indices based on the file b

ADD REPLYlink modified 22 months ago • written 22 months ago by Za120
3
gravatar for cpad0112
22 months ago by
cpad011212k
India
cpad011212k wrote:

b and bb in OP doesn't share any gene. What is being asked, is not possible when values do not exist (between two data sets). Look at the following example and see if this is what is required:

> b=data.frame(genes=paste("gene", sample(10), sep="_"), expn=round(rnorm(10,1,4),2))
> bb=data.frame(genes=paste("gene",sample(5), sep="_"))
> b
     genes  expn
1   gene_6 -0.78
2  gene_10  4.65
3   gene_9  1.86
4   gene_3  2.39
5   gene_1  0.34
6   gene_2  2.01
7   gene_8 -4.51
8   gene_7 -7.61
9   gene_5  1.50
10  gene_4  3.12

> bb
   genes
1 gene_2
2 gene_1
3 gene_3
4 gene_4
5 gene_5

> b[match(bb$genes,b$genes),]
    genes expn
6  gene_2 2.01
5  gene_1 0.34
4  gene_3 2.39
10 gene_4 3.12
9  gene_5 1.50
ADD COMMENTlink modified 22 months ago • written 22 months ago by cpad011212k

I checked in venn diagram, there are many genes common between files

ADD REPLYlink written 22 months ago by Za120

Thank you, actually your code worked

ADD REPLYlink written 22 months ago by Za120

Did you try other solutions posted here?

ADD REPLYlink written 22 months ago by cpad011212k

Actually not yet, even I just tried your code on a toy data. Today I am going to try these codes

Thanks a lot both you and zx8754

ADD REPLYlink modified 22 months ago • written 22 months ago by Za120
1
gravatar for zx8754
22 months ago by
zx87549.1k
London
zx87549.1k wrote:

Add row number to b then merge, and finally re-order merged dataframe based on our row number, see this example:

# example input data
b <- read.table(text = "
gene index
DDB_G0295603 0.9922432
DDB_G0295719 0.9917077
DDB_G0292120 0.3333333
DDB_G0282307 0.9876919
DDB_G0269672 0.9862853
DDB_G0269462 0.6666666
DDB_G0284895 0.9853162
DDB_G0274031 0.9803622", header = TRUE, stringsAsFactors = FALSE)

bb <- read.table(text = "
gene
DDB_G0292120
DDB_G0278649
DDB_G0288947
DDB_G0269462
DDB_G0278757
DDB_G0281793", header = TRUE, stringsAsFactors = FALSE)

# add row number
b$myOrder <- seq(nrow(b))

# then merge to get "index" and "myOrder" columns
res <- merge(bb, b, by = "gene")

# and reorder the merged dataframe
res <- res[ order(res$myOrder), ]

res
#           gene     index myOrder
# 2 DDB_G0292120 0.3333333       3
# 1 DDB_G0269462 0.6666666       6
ADD COMMENTlink modified 22 months ago • written 22 months ago by zx87549.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour