Question

Combining files based on chromosome and position next to each other - column vise

0

Entering edit mode

4.4 years ago

Gene ▴ 20

I have multiple files in format: chr position value.

I want to combine them in format "chr", "position", "samp1", "samp2", "samp3", "samp4",........

For example:

Samp1:

chr position value

1   3774318 1

1   3774319 1

1   3775200 2

1   3775201 7

1   3775202 70

1   3775203 7

1   3775204 270

1   3775205 3

1   3775206 5

Samp 2:

chr position value

1   3775200 1

1   3775201 1

1   3775202 10

1   3775203 1

1   3775204 12

1   3775205 1

1   3775206 13

1   3775207 1

1   3775208 1

1   3775209 18

and so on ....

Desired output file: / I put random values in the output file

chr, position, value-samp1, value-samp2, value-samp3, value-samp4

1 50204 2 17 5 2

1 50205 2 17 5 2

1 50206 2 18 5 2

1 50207 2 19 5 3

1 50208 3 19 5 3

1 50209 3 19 5 3

Or in this case : { chr position samp1 samp2

1 3774318 1 0

1 3774319 1 0

1 3775200 2 1

1 3775201 0 1

1 3775202 70 10

1 3775203 7 1

1 3775204 270 12

1 3775205 3 1

1 3775206 5 13

1 3775207 7 1

1 3775208 0 1

1 3775209 0 18 }

I tried join, merge, cat, but it does not work as I expected. I am a begginer. Do you have any ideas how it can be done?

sequencing coverage Assembly • 1.1k views

ADD COMMENT • link updated 4.4 years ago by Pierre Lindenbaum 161k • written 4.4 years ago by Gene ▴ 20

0

Entering edit mode

Sounds like bedtools intersect can help. Just duplicate the position column to two columns, probably should use -loj, then remove unnecessary columns

ADD REPLY • link 4.4 years ago by Asaf 10k

0

Entering edit mode

I don't think there are some one-click tools can do this job. It may require some coding work to do this task, python, R , etc .

ADD REPLY • link 4.4 years ago by shoujun.gu ▴ 350

score 2 · Answer 1 · 2019-11-08

2

Entering edit mode

4.4 years ago

Pierre Lindenbaum 161k

assuming tab delimted fliles. create a file of uniq keys.

# create the uniq keys
cat input.*.txt  | cut -f 1,2 | tr "\t" "_" | sort | uniq > keys.txt

# for each file, fill the empty field
for F in input.*.txt
do
    sed 's/\t/_/' "$F" | sort -t $'\t' -k1,1 > "${F}.2"
    join -t $'\t' -e NA -a 1 -1 1 -2 1 -o "1.1,2.2" keys.txt "${F}.2" > "${F}.3"
done

# join all files
for F in input.*.txt
do
    join -t $'\t' -1 1 -2 1  keys.txt "${F}.3" > tmp
    mv tmp keys.txt
done

# dump result
tr "_" "\t" < keys.txt