Question: Restructuring table: rows to column and interleaving columns
1
gravatar for akang
2.3 years ago by
akang90
akang90 wrote:

I want to transpose a file like below where the 2nd column on-words each column header (B4, B3, E0 )can take two values. I want all the values for B4 B3...to be in one row. which means B4 B3 E0 will be seperate rows. hOw can it be done awk sed or python. I can do simple transpose in python but dont understand how to solve this particular problem.Ill appreciate any help.

Input : 2nd and 3rd column have the same column name i.e B4, similarly 4th and 5th column have same column name i.e B3 and so on..when we transpose both the values corresponding to B4 should transpose together as a unit like 12 13 13 14 13 13 12 13 13 13 12 13 ..it should be in one line
EDIT: file consists of over 20 columns and 2000 rows

ID  B4  B4  B3  B3
1   12  13  19  21
2   13  14  19  21
3   13  13  19  21
4   12  13  19  19
5   13  13  18  19
6   12  13  19  21

Desired Output

ID 1 1 2 2 3 3 4 4 5 5 6 6
B4 12  13 13  14 13  13 12  13 13  13 12  13
B3 19  21 19  21 19  21 19  19 18  19 19  21
snp transpose • 1.0k views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by akang90
1

What is the relation to bioinformatics?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Michael Dondrup46k
1

It is microsatellite data that i am trying to merge with snp data

ADD REPLYlink written 2.3 years ago by akang90

I would:
Loop over the file, create a dictionary with key = marker and value = list containing genotypes. Then write out the dictionary to a file. A defaultdict will be useful (see collections.defaultdict).

ADD REPLYlink written 2.3 years ago by WouterDeCoster40k

Perhaps you could provide more simple examples of input and output or alternatively describe your problem more clearly..

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by 5heikki8.5k

To improve your example data, you should do the following:

  • simplify your input data to contain only the columns to produce the exact output
  • each column should have a column name
ADD REPLYlink written 2.3 years ago by Michael Dondrup46k

I edited the question to make the example more clear. I have also added headers to the columns without header and removed the > from the beginning. Now, the input can be really used for testing.

ADD REPLYlink written 2.3 years ago by Michael Dondrup46k

Thanks! That sure does make it easy to understand.

ADD REPLYlink written 2.3 years ago by akang90

I also can see now that you have incomplete IDs in the output, you need 12 IDs but have 6, you need to replicate the IDs as well.

ADD REPLYlink written 2.3 years ago by Michael Dondrup46k

You are right! I changed that in the desired output section.

ADD REPLYlink written 2.3 years ago by akang90

Thanks everyone! With all your help I got it done!

ADD REPLYlink written 2.3 years ago by akang90

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 2.3 years ago by WouterDeCoster40k
4
gravatar for cpad0112
2.3 years ago by
cpad011211k
India
cpad011211k wrote:

my starting data:

ID  B4  B4  B3  B3
1   12  13  19  21
2   13  14  19  21
3   13  13  19  21
4   12  13  19  19
5   13  13  18  19
6   12  13  19  21

code:

data=read.csv("test.txt", stringsAsFactors = F, sep="")
row.names(data)=data[,1]
data=data[,c(-1)]
odd_columns=seq(1, ncol(data), by=2) 
odd=t(data[,odd_columns])
even_columns=seq(2, ncol(data), by=2) 
even=t(data[,even_columns])
oven=cbind(odd,even)
newoven=oven[,order(colnames(oven))]

results:

> newoven
    1  1  2  2  3  3  4  4  5  5  6  6
B4 12 13 13 14 13 13 12 13 13 13 12 13
B3 19 21 19 21 19 21 19 19 18 19 19 21
ADD COMMENTlink written 2.3 years ago by cpad011211k
1

Read data

 data=read.csv("test.txt", stringsAsFactors = F, sep="")

data

> data
  B4 B4.1 B3 B3.1
1 12   13 19   21
2 13   14 19   21
3 13   13 19   21
4 12   13 19   19
5 13   13 18   19
6 12   13 19   21

data transformation

data=data[-1]
odd=seq(1,ncol(data), by=2)
newdata=cbind(t(data[odd]),t(data[odd+1]))
colnames(newdata)=sort(rep(rownames(data),2))

print new data

> newdata
    1  1  2  2  3  3  4  4  5  5  6  6
B4 12 13 13 12 13 12 13 14 13 13 13 13
B3 19 19 19 19 18 19 21 21 21 19 19 21

what if you want to merge alternate columns (for eg first two columns, followed by next two columns)?

odd=seq(1,ncol(newdata), by=2)
ncol(newdata)
newdata[,odd]+newdata[,odd+1]

output

> newdata[,odd]+newdata[,odd+1]
    1  2  3  4  5  6
B4 25 25 25 27 26 26
B3 38 38 37 42 40 40
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by cpad011211k

Awesome solution! :)

ADD REPLYlink written 2.3 years ago by Michael Dondrup46k
0
gravatar for Ashley
2.3 years ago by
Ashley50
China
Ashley50 wrote:

using R software mydata<-read.table("input",sep="\t") mydata_t<-t(mydata)

ADD COMMENTlink written 2.3 years ago by Ashley50
1

the one u told is simple transpose..i want values of 2nd and 3rd column to merge and then 4th and 5th and so on

ADD REPLYlink written 2.3 years ago by akang90

Your title suggested also that transposing was what you needed. I've changed that.

ADD REPLYlink written 2.3 years ago by WouterDeCoster40k
0
gravatar for Michael Dondrup
2.3 years ago by
Bergen, Norway
Michael Dondrup46k wrote:
 m <- read.table(file=stdin(), header=T, sep="\t", row.names = 1, fill=F) ## adjust to your needs
 o <- matrix(ncol=nrow(m)*2,nrow=0)
 colindex <- seq(from=1, to=ncol(m)-1, by=2)
 for (i in colindex)  {  
  # this is difficult to do without for loop, because you want to 'zip' together one column with the next one
  tmprow <- as.numeric(unlist(apply(m[c(i,i+1)],1,list)))
  o <- rbind(o, tmprow)
 }
rownames(o) <- colnames(m)[colindex]
colnames(o) <- unlist(lapply(rownames(m), rep, 2))

o 
    1  1  2  2  3  3  4  4  5  5  6  6
B4 12 13 13 14 13 13 12 13 13 13 12 13
B3 19 21 19 21 19 21 19 19 18 19 19 21

This seems to work, including the column and row-names. If someone can write this without using a for loop, you will gain some extra points.

This should work as a quick hack, but there might be a better way to fix this without a loop.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Michael Dondrup46k
0
gravatar for Petr Ponomarenko
2.3 years ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:

awk 'FNR==1{id="ID";b4="B4";b3="B3"}FNR>1{id=id FS $1 FS $1;b4=b4 FS $2 FS $3;b3=b3 FS $4 FS $5}END{print id;print b4;print b3}' input.txt

ADD COMMENTlink written 2.3 years ago by Petr Ponomarenko2.6k

Thanks for the code! But in real the file consists of over 20 columns and 2000 rows. So the code you wrote might not work.

ADD REPLYlink written 2.3 years ago by akang90

Please use real data in snipped then. Otherwise you and me just wasted a bit of our time. You can use for loop to collect and print the data.

ADD REPLYlink written 2.3 years ago by Petr Ponomarenko2.6k
0
gravatar for Michael Dondrup
2.3 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

So, here is a solution in R without for loops. The problem, using apply is that per default we do not know in which column we are at the moment of processing, and you cannot always rely on the assumption that the columns will be processed in sequence (e.g. using parapply etc.).

Solution:

split the matrix in two, one for the odd and one for the even columns. The add a row with the column number to the first matrix, so we know were we are again. Assuming, m is holding the input data, like before:

my.zipper <- function(x) {
    current.col = x['DELME'] # which column are we processing?
    x <- x[-length(x)] # remove the index from the processing
    as.numeric(unlist(apply(cbind(x,m2[,current.col]),1,list)))
}

m <- read.table(file=stdin(), header=T, sep="\t", row.names = 1, fill=F) ## adjust to your needs  
colindex <- seq(from=1, to=ncol(m)-1, by=2)
m1 <- m[,colindex] # even cols
m2 <- m[,colindex+1] # odd cols
m1 <- rbind(m1, DELME=1:ncol(m1)) # add the column number to the m1 matrix

o <-t(apply(m1, 2, my.zipper))
colnames(o) <- unlist(lapply(rownames(m), rep, 2)) # adjust column names, row.names are correct already.
o
 o

 1  1  2  2  3  3  4  4  5  5  6  6
B4 12 13 13 14 13 13 12 13 13 13 12 13
B3 19 21 19 21 19 21 19 19 18 19 19 21
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2644 users visited in the last hour