Question: Combining files based on chromosome and position next to each other - column vise
0
gravatar for Gene
9 months ago by
Gene 20
Gene 20 wrote:

I have multiple files in format: chr position value.

I want to combine them in format "chr", "position", "samp1", "samp2", "samp3", "samp4",........

For example:

Samp1:

chr position value

1   3774318 1

1   3774319 1

1   3775200 2

1   3775201 7

1   3775202 70

1   3775203 7

1   3775204 270

1   3775205 3

1   3775206 5

Samp 2:

chr position value

1   3775200 1

1   3775201 1

1   3775202 10

1   3775203 1

1   3775204 12

1   3775205 1

1   3775206 13

1   3775207 1

1   3775208 1

1   3775209 18

and so on ....

Desired output file: / I put random values in the output file

chr, position, value-samp1, value-samp2, value-samp3, value-samp4

1 50204 2 17 5 2

1 50205 2 17 5 2

1 50206 2 18 5 2

1 50207 2 19 5 3

1 50208 3 19 5 3

1 50209 3 19 5 3

Or in this case : { chr position samp1 samp2

1 3774318 1 0

1 3774319 1 0

1 3775200 2 1

1 3775201 0 1

1 3775202 70 10

1 3775203 7 1

1 3775204 270 12

1 3775205 3 1

1 3775206 5 13

1 3775207 7 1

1 3775208 0 1

1 3775209 0 18 }

I tried join, merge, cat, but it does not work as I expected. I am a begginer. Do you have any ideas how it can be done?

sequencing coverage assembly • 252 views
ADD COMMENTlink modified 9 months ago by Pierre Lindenbaum129k • written 9 months ago by Gene 20

Sounds like bedtools intersect can help. Just duplicate the position column to two columns, probably should use -loj, then remove unnecessary columns

ADD REPLYlink written 9 months ago by Asaf8.3k

I don't think there are some one-click tools can do this job. It may require some coding work to do this task, python, R , etc .

ADD REPLYlink modified 9 months ago • written 9 months ago by shoujun.gu310
2
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

assuming tab delimted fliles. create a file of uniq keys.

# create the uniq keys
cat input.*.txt  | cut -f 1,2 | tr "\t" "_" | sort | uniq > keys.txt

# for each file, fill the empty field
for F in input.*.txt
do
    sed 's/\t/_/' "$F" | sort -t $'\t' -k1,1 > "${F}.2"
    join -t $'\t' -e NA -a 1 -1 1 -2 1 -o "1.1,2.2" keys.txt "${F}.2" > "${F}.3"
done

# join all files
for F in input.*.txt
do
    join -t $'\t' -1 1 -2 1  keys.txt "${F}.3" > tmp
    mv tmp keys.txt
done

# dump result
tr "_" "\t" < keys.txt
ADD COMMENTlink written 9 months ago by Pierre Lindenbaum129k

Thank you a lot. It was really helpful and it is working.

ADD REPLYlink written 9 months ago by Gene 20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour