combining rows in sequance file
2
0
Entering edit mode
9.8 years ago
Tohamy ▴ 80

Dear All, I do not have much experience with R and I need your help.

I have a data from like this:

1_1  A  B  C  D
1_2  a  b  c  d
2_1  E  F  G  H
2_2  e  f  g   h
3_1  I   J   K  L
3_2  i   j   k   l

I want to combine each tow rows like this

1  A a B b C c D d
2  E e F f G g H h
3  I i J j K k L l

How can I do this?

Thanks in advance

sequencing R • 2.2k views
ADD COMMENT
0
Entering edit mode

Given the initial 6x4 matrix, do you want a 3x4 or a 3x8 matrix output? That's rather ambiguous from your presentation. I assume you want the latter, but perhaps not.

ADD REPLY
2
Entering edit mode
9.8 years ago

Let's assume that your data.frame is homogeneous in type and can be coerced to a matrix:

m <- as.matrix(df)

Then, the most efficient route is array manipulation:

a <- aperm(array(m, c(2L, nrow(m)/2L, ncol(m))), c(1L, 3L, 2L))
m2 <- matrix(a, dim(a)[3L], byrow=TRUE)
ADD COMMENT
0
Entering edit mode

Hey Mr Lawrence,

Your script works perfectly with the toy of data set but when I try to use it with my real data it does not work so well. My data consists of 174 individuals. Each individuals has two lines and in each line these are 120693 SNPs:

>h<- read.table("imputed_final.stru", sep="\t")
> m<-as.matrix(h)
> a <- aperm(array(m, c(2L, nrow(m)/2L, ncol(m))), c(1L, 3L, 2L))
> m2 <- matrix(a, dim(a)[3L], byrow=TRUE)
> View(m2)

It gives me one line for each individual but they are seprated like this:

Ind1_1  T C G C T   Ind1_2  T C G C T
Ind2_1  T C G C T   Ind2_2  T C G C T
Ind3_1  T C G C T   Ind3_2  T C G C T

not like in the toy

> m3 <- matrix(a, dim(a)[3L], byrow=TRUE)
> m3
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "A"  "a"  "B"  "b"  "C"  "c"  "D"  "d"
[2,] "E"  "e"  "F"  "f"  "G"  "g"  "H"  "h"
[3,] "J"  "j"  "K"  "k"  "L"  "l"  "M"  "m"

Note: It works without any error or warrning messages with my real data but at the end it does not give me the rquired format.

Thanks and I am really appreciated to your help.

Best

ADD REPLY
0
Entering edit mode

It's generally more useful if you say how it doesn't match the format you need (and then show the output and what it should be like).

ADD REPLY
1
Entering edit mode
9.8 years ago

Not a bionformatics question, but use apply

The way you are formatting is a bit hard to get it and what exactly you want to do.

Considering, your data is symmetric as you showed few lines there.

where dat is your matrix

The code might change with the full matrix, but I leave it to you to implement.

ADD COMMENT
0
Entering edit mode

First of all, thanks for your help. But

apply( dat[ , colnames(dat) ] , 1 , paste , collapse = " " )

does not work well with my case or may be I did not explain my question very well.

I need to combine each tow rwos for the same individual in on row. For example I need to have 3 rows instead 6 rows not one row as

1_1       1_2       2_1       2_2       3_1       3_2.
"A B C D" "a b c d" "E F G H" "e f g h" "J K L M" "j k l m"

Also, it does not merge the columns like what I need. I need it like this A a B b C c D d and soon for other individuals.

Thanks you so much and sorry for disturbing you. But I need to do that for a large file that contains 174 individuals and 120000 SNPs.

> dat<-read.table("cbin.csv",sep="\t", row.names=1)
> dat
    V2 V3 V4 V5
1_1  A  B  C  D
1_2  a  b  c  d
2_1  E  F  G  H
2_2  e  f  g  h
3_1  J  K  L  M
3_2  j  k  l  m
> l<- apply( dat[ , colnames(dat) ] , 1 , paste , collapse = " " )
> l
      1_1       1_2       2_1       2_2       3_1       3_2
"A B C D" "a b c d" "E F G H" "e f g h" "J K L M" "j k l m"
ADD REPLY
0
Entering edit mode

Check the edited answer

ADD REPLY

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6