how make a data table by merging three data sets?
1
0
Entering edit mode
4.8 years ago
star ▴ 350

I have three tables contain variation, I would like to merge all three make together and have a table as below. I would like to consider A coordinate (chr and start) as the base of merging and if there are the same coordinate in B and C data put them beside A coordinate otherwise put '.' value instead.

A data:

chr1    19004   A   G
chr1    19858   C   T
chr1    46633   T   A
chr1    28563   A   G
chr1    66974   TG  T

B data:

chr1    19322   C   T
chr1    19858   C   T
chr1    63527   T   C
chr1    66974   TG  T

C data:

chr1    19004   T   G
chr1    46633   T   A
chr1    64548   A   C
chr1    66974   TG  T

Desired output:

A.chr A.start  A.1  A.2    B.chr  B.start  B.1  B.2   C.chr  C.start  C.1  C.2

chr1    19004   A   G       .       .       .    .    chr1    19004     A   G
chr1    19858   C   T      chr1   19858      C   T      .       .       .   .  
chr1    46633   T   A        .       .       .    .   chr1    46633     T   A
chr1    28563   A   G        .       .       .    .     .       .       .   .
chr1    63527   T   C        .       .       .    .     .       .       .   .
chr1    64548   A   C        .       .       .    .     .       .       .   .
chr1    66974   TG  T     chr1    66974      TG  T    chr1    66974     TG  T
R merge linux data.frame • 1.1k views
ADD COMMENT
3
Entering edit mode

You should really try and find answers for these kinds of problems yourself. I remember you asked this kind of question before in a different context and got many solutions. Try to understand the logic behind it. There is no point in spoon-feeding as you cannot abstract it to different problems.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
4.6 years ago

You could convert your text files to sorted BED with awk and sort-bed.

$ awk -vFS="\t" -vOFS="\t" '{ print $1, $2, ($2+1), $3, $4 }' A.txt | sort-bed - > A.bed

Repeat for B.txt and C.txt.

Then you could do:

$ bedops --everything A.bed B.bed C.bed > Union.bed
$ bedmap --echo --echo-map --delim '\t' --unmapped-val '.\t.\t.\t.\t.\t' Union.bed > Answer.bed

Pipe Answer.bed to cut to get a subset of fields of interest.

ADD COMMENT

Login before adding your answer.

Traffic: 2893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6