Question: Mapping columns based on a list
0
gravatar for Za
17 days ago by
Za40
Za40 wrote:

Hi,

I have a raw counts data with barcodes in columns and genes in rows, and a list of correspondance of barcodes and sample numbers

How I can map barcodes to sample numbers?

rna-seq next-gen • 161 views
ADD COMMENTlink modified 14 days ago by cpad01126.4k • written 17 days ago by Za40
1

What are you thinking when you say :

How I can map barcodes to sample numbers?

Could you make a example of the expected results please

Are you using R, Python, Perl... ?

Your raw counts are in files, dataframe or matrix ?

ADD REPLYlink modified 17 days ago • written 17 days ago by Bastien Hervé1.3k

Thank you, I am in R and mac OS. both data are in separate matrices. I expect that my raw counts file has sample names (h16.sc1, h16.sc2, etc) in columns instead of barcodes.

ADD REPLYlink modified 17 days ago • written 17 days ago by Za40
2
gravatar for cpad0112
14 days ago by
cpad01126.4k
cpad01126.4k wrote:

bar and bartable are borrowed from shenwei356 from above post:

bar=read.csv("bar.txt", sep="\t", header = T, strip.white = T,stringsAsFactors = F)
bartable=read.csv("table.tsv", sep="\t", header = T, strip.white = T, stringsAsFactors = F)
bartable
> bartable
    gene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA
1  gene1           0           0           3          0
2 gen1e2           0           0           0          0
> bar
  Sample     Barcode
1    sc1 CCTAGATTAAT
2    sc2 GAAGACTTGGT
3    sc3 GAAGCAGTATG
4    sc4 GGTAACCTGAC
5    sc5 ATAGTTCTCGT

for (i in colnames(bartable)){
    if ( i %in% bar$Barcode){
        colnames(bartable)[match(i,colnames(bartable))] = as.character(bar[which(bar$Barcode==i),][1])
    }
}
> bartable
    gene sc5 sc3 sc2 AAAAAAAAAA
1  gene1   0   0   3          0
2 gen1e2   0   0   0          0
ADD COMMENTlink written 14 days ago by cpad01126.4k
1
gravatar for shenwei356
17 days ago by
shenwei3563.8k
China
shenwei3563.8k wrote:

Try csvtk, supporting the two files are tab-separated.

updated with v0.14.0 or later version

./csvtk rename2 -t -f -gene -p '(.+)' -r '{kv}' -k <(./csvtk cut -t -f 2,1 barcodes.tsv)  -K  ounts.tsv> result.tsv

Example:

$ cat barcodes.tsv 
Sample  Barcode
sc1     CCTAGATTAAT
sc2     GAAGACTTGGT
sc3     GAAGCAGTATG
sc4     GGTAACCTGAC
sc5     ATAGTTCTCGT

$ cat table.tsv 
gene    ATAGTTCTCGT     GAAGCAGTATG     GAAGACTTGGT     AAAAAAAAAA
gene1   0       0       3       0
gen1e2  0       0       0       0

# note that, we must arrange the order of barcodes.tsv in KEY-VALUE
$ csvtk cut -t -f 2,1 barcodes.tsv 
Barcode Sample
CCTAGATTAAT     sc1
GAAGACTTGGT     sc2
GAAGCAGTATG     sc3
GGTAACCTGAC     sc4
ATAGTTCTCGT     sc5

# here we go!!!!

$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv 
gene    sc5     sc3     sc2     unknown
gene1   0       0       3       0
gen1e2  0       0       0       0

original answer

$ csvtk transpose -t table.tsv \
    | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
    | csvtk transpose -t \
    > result.tsv

It's a little verbose, I will make csvtk rename2 supporting {kv} soon so we can avoid using transpose.

ADD COMMENTlink modified 17 days ago • written 17 days ago by shenwei3563.8k

Sorry, is there for mac? I just notices download for windows and linux

ADD REPLYlink written 17 days ago by Za40

it supports mac. https://github.com/shenwei356/csvtk/releases/download/v0.13.0/csvtk_darwin_amd64.tar.gz

ADD REPLYlink written 17 days ago by shenwei3563.8k

Thank you, I downloaded but there is only an executable thing named csvtk. I can't figure out how to deal with that

I set work directory to executable file but saying

dhcp179185:Downloads $ csvtk transpose -t counts.tsv \
>     | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
> 
-bash: csvtk: command not found
ADD REPLYlink modified 17 days ago • written 17 days ago by Za40

run

./csvtk xxx
ADD REPLYlink written 17 days ago by shenwei3563.8k

Answer updated. It's much easier.

ADD REPLYlink written 17 days ago by shenwei3563.8k
dhcp179185:Downloads $ csvtk-0.14.0
-bash: csvtk-0.14.0: command not found
dhcp179185:Downloads$

Sorry I don't know how to install that

ADD REPLYlink written 17 days ago by Za40
1

You have to consider the ./ before the command. Which means "run the executable located in this folder".

ADD REPLYlink modified 17 days ago • written 17 days ago by finswimmer2.8k
0
gravatar for Bastien Hervé
17 days ago by
Bastien Hervé1.3k
Limoges, CBRS, France
Bastien Hervé1.3k wrote:

From this thread, try under R :

counts <- read.table(file="/path/to/counts.csv", sep="\t", header=TRUE, row.names=1)
samples <- read.table(file="/path/to/samples.csv", sep="\t", header=TRUE)
counts$id <- row.names(counts)
mdfa <- reshape2::melt(counts, id.vars = "id", variable.name = "Barcode")
reshape2::dcast(merge(samples, mdfa, by = "Barcode"), id ~ Sample, fun.aggregate = sum)
ADD COMMENTlink modified 17 days ago • written 17 days ago by Bastien Hervé1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 976 users visited in the last hour