Question: Mapping columns based on a list
0
gravatar for Za
6 months ago by
Za120
Za120 wrote:

Hi,

I have a raw counts data with barcodes in columns and genes in rows, and a list of correspondance of barcodes and sample numbers

How I can map barcodes to sample numbers?

rna-seq next-gen • 282 views
ADD COMMENTlink modified 6 months ago by cpad011210k • written 6 months ago by Za120
1

What are you thinking when you say :

How I can map barcodes to sample numbers?

Could you make a example of the expected results please

Are you using R, Python, Perl... ?

Your raw counts are in files, dataframe or matrix ?

ADD REPLYlink modified 6 months ago • written 6 months ago by Bastien Hervé2.7k

Thank you, I am in R and mac OS. both data are in separate matrices. I expect that my raw counts file has sample names (h16.sc1, h16.sc2, etc) in columns instead of barcodes.

ADD REPLYlink modified 6 months ago • written 6 months ago by Za120
2
gravatar for cpad0112
6 months ago by
cpad011210k
India
cpad011210k wrote:

bar and bartable are borrowed from shenwei356 from above post:

bar=read.csv("bar.txt", sep="\t", header = T, strip.white = T,stringsAsFactors = F)
bartable=read.csv("table.tsv", sep="\t", header = T, strip.white = T, stringsAsFactors = F)
bartable
> bartable
    gene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA
1  gene1           0           0           3          0
2 gen1e2           0           0           0          0
> bar
  Sample     Barcode
1    sc1 CCTAGATTAAT
2    sc2 GAAGACTTGGT
3    sc3 GAAGCAGTATG
4    sc4 GGTAACCTGAC
5    sc5 ATAGTTCTCGT

for (i in colnames(bartable)){
    if ( i %in% bar$Barcode){
        colnames(bartable)[match(i,colnames(bartable))] = as.character(bar[which(bar$Barcode==i),][1])
    }
}
> bartable
    gene sc5 sc3 sc2 AAAAAAAAAA
1  gene1   0   0   3          0
2 gen1e2   0   0   0          0
ADD COMMENTlink written 6 months ago by cpad011210k
1
gravatar for shenwei356
6 months ago by
shenwei3564.3k
China
shenwei3564.3k wrote:

Try csvtk, supporting the two files are tab-separated.

updated with v0.14.0 or later version

./csvtk rename2 -t -f -gene -p '(.+)' -r '{kv}' -k <(./csvtk cut -t -f 2,1 barcodes.tsv)  -K  ounts.tsv> result.tsv

Example:

$ cat barcodes.tsv 
Sample  Barcode
sc1     CCTAGATTAAT
sc2     GAAGACTTGGT
sc3     GAAGCAGTATG
sc4     GGTAACCTGAC
sc5     ATAGTTCTCGT

$ cat table.tsv 
gene    ATAGTTCTCGT     GAAGCAGTATG     GAAGACTTGGT     AAAAAAAAAA
gene1   0       0       3       0
gen1e2  0       0       0       0

# note that, we must arrange the order of barcodes.tsv in KEY-VALUE
$ csvtk cut -t -f 2,1 barcodes.tsv 
Barcode Sample
CCTAGATTAAT     sc1
GAAGACTTGGT     sc2
GAAGCAGTATG     sc3
GGTAACCTGAC     sc4
ATAGTTCTCGT     sc5

# here we go!!!!

$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv 
gene    sc5     sc3     sc2     unknown
gene1   0       0       3       0
gen1e2  0       0       0       0

original answer

$ csvtk transpose -t table.tsv \
    | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
    | csvtk transpose -t \
    > result.tsv

It's a little verbose, I will make csvtk rename2 supporting {kv} soon so we can avoid using transpose.

ADD COMMENTlink modified 6 months ago • written 6 months ago by shenwei3564.3k

Sorry, is there for mac? I just notices download for windows and linux

ADD REPLYlink written 6 months ago by Za120

it supports mac. https://github.com/shenwei356/csvtk/releases/download/v0.13.0/csvtk_darwin_amd64.tar.gz

ADD REPLYlink written 6 months ago by shenwei3564.3k

Thank you, I downloaded but there is only an executable thing named csvtk. I can't figure out how to deal with that

I set work directory to executable file but saying

dhcp179185:Downloads $ csvtk transpose -t counts.tsv \
>     | csvtk replace -t -f gene -p '^(.+)$' -r '{kv}' -k <(csvtk cut -t -f 2,1 barcodes.tsv)  -K \
> 
-bash: csvtk: command not found
ADD REPLYlink modified 6 months ago • written 6 months ago by Za120

run

./csvtk xxx
ADD REPLYlink written 6 months ago by shenwei3564.3k

Answer updated. It's much easier.

ADD REPLYlink written 6 months ago by shenwei3564.3k
dhcp179185:Downloads $ csvtk-0.14.0
-bash: csvtk-0.14.0: command not found
dhcp179185:Downloads$

Sorry I don't know how to install that

ADD REPLYlink written 6 months ago by Za120
1

You have to consider the ./ before the command. Which means "run the executable located in this folder".

ADD REPLYlink modified 6 months ago • written 6 months ago by finswimmer8.2k
0
gravatar for Bastien Hervé
6 months ago by
Bastien Hervé2.7k
Limoges, CBRS, France
Bastien Hervé2.7k wrote:

From this thread, try under R :

counts <- read.table(file="/path/to/counts.csv", sep="\t", header=TRUE, row.names=1)
samples <- read.table(file="/path/to/samples.csv", sep="\t", header=TRUE)
counts$id <- row.names(counts)
mdfa <- reshape2::melt(counts, id.vars = "id", variable.name = "Barcode")
reshape2::dcast(merge(samples, mdfa, by = "Barcode"), id ~ Sample, fun.aggregate = sum)
ADD COMMENTlink modified 6 months ago • written 6 months ago by Bastien Hervé2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1641 users visited in the last hour