Question

filtering column based on column header from a file

0

Entering edit mode

4.3 years ago

Bioinfonext ▴ 470

Hi,

I wanted to filter multiple columns based on the column header name from a large count data file.

Count data file;
count.txt  


                                 Soil.15.S8.L001             Soil.16.S9.L001      Soil.2.S1.L001        
d2ec9f3b77975c0f457e4b7413b217ff            84                      63            106                    
3147790f0d5a78316fb9dd64f53b9473            69                      49            95                  
97aecc1f35cc1f50db507ad71dd22367             0                      15            14               
bfad6370d28182cc6304844e9bec7fb6            271                     75            30

...

Column name file;
column.txt

Soil.15.S8.L001,Soil.16.S9.L001,Soil.2.S1.L001.........

.

Could you please suggest to me how I can get the above column name data from the count data file

Many thanks

bash linux R • 1.5k views

ADD COMMENT • link updated 4.3 years ago by Sam ★ 4.8k • written 4.3 years ago by Bioinfonext ▴ 470

0

Entering edit mode

I'm having trouble understanding what you want to do. Can you explain your question in more detail, and possibly provide a small example of the inputs and expected outputs?

ADD REPLY • link 4.3 years ago by rpolicastro 13k

0

Entering edit mode

sorry for that, I have updated it now with more details.

ADD REPLY • link 4.3 years ago by Bioinfonext ▴ 470

0

Entering edit mode

try:

df=iris
vec=names(iris)[1:2]
df[,vec]

#with dplyr

dplyr:: select(df,vec)

ADD REPLY • link 4.3 years ago by cpad0112 21k

score 2 · Accepted Answer · 2020-11-06

2

Entering edit mode

4.3 years ago

Sam ★ 4.8k

Assuming you are trying to extract the columns by column names, you can do the following with data.table

library(data.table)
col <- unlist(fread("column.txt", header=F))
count <- fread("count.txt")
extracted <- count[,col, with=F]

If you don't have data.table install, you can do

col <- read.table("column.txt", sep=",")
count <- read.table("count.txt", header=T)
extracted <- count[,colnames(count) %in% col[1,]]

ADD COMMENT • link 4.3 years ago by Sam ★ 4.8k

0

Entering edit mode

thanks, data.table command working well, but I am not getting the identifier as first column like count data file.

    Soil.9.S42  Soil-27.S33.L001    Soil-45.S54.L001    Soil-6.S22.L001 
1       34        51                      84                    41  
2       28        32                      52                    27  
3       33         0                      7                     12

ADD REPLY • link 4.3 years ago by Bioinfonext ▴ 470

1

Entering edit mode

I am not sure what your identifier column name is, but assuming it is "ID" you can do

col <- c("ID", col)

And continue

ADD REPLY • link 4.3 years ago by Sam ★ 4.8k