Question: Combining files based on chromosome and position next to each other - column vise
0
gravatar for Gene
11 days ago by
Gene 10
Gene 10 wrote:

I have multiple files in format: chr position value.

I want to combine them in format "chr", "position", "samp1", "samp2", "samp3", "samp4",........

For example:

Samp1:

chr position value

1   3774318 1

1   3774319 1

1   3775200 2

1   3775201 7

1   3775202 70

1   3775203 7

1   3775204 270

1   3775205 3

1   3775206 5

Samp 2:

chr position value

1   3775200 1

1   3775201 1

1   3775202 10

1   3775203 1

1   3775204 12

1   3775205 1

1   3775206 13

1   3775207 1

1   3775208 1

1   3775209 18

and so on ....

Desired output file: / I put random values in the output file

chr, position, value-samp1, value-samp2, value-samp3, value-samp4

1 50204 2 17 5 2

1 50205 2 17 5 2

1 50206 2 18 5 2

1 50207 2 19 5 3

1 50208 3 19 5 3

1 50209 3 19 5 3

Or in this case : { chr position samp1 samp2

1 3774318 1 0

1 3774319 1 0

1 3775200 2 1

1 3775201 0 1

1 3775202 70 10

1 3775203 7 1

1 3775204 270 12

1 3775205 3 1

1 3775206 5 13

1 3775207 7 1

1 3775208 0 1

1 3775209 0 18 }

I tried join, merge, cat, but it does not work as I expected. I am a begginer. Do you have any ideas how it can be done?

sequencing coverage assembly • 119 views
ADD COMMENTlink modified 11 days ago by Pierre Lindenbaum124k • written 11 days ago by Gene 10

Sounds like bedtools intersect can help. Just duplicate the position column to two columns, probably should use -loj, then remove unnecessary columns

ADD REPLYlink written 11 days ago by Asaf6.5k

I don't think there are some one-click tools can do this job. It may require some coding work to do this task, python, R , etc .

ADD REPLYlink modified 11 days ago • written 11 days ago by shoujun.gu250
1
gravatar for Pierre Lindenbaum
11 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

assuming tab delimted fliles. create a file of uniq keys.

# create the uniq keys
cat input.*.txt  | cut -f 1,2 | tr "\t" "_" | sort | uniq > keys.txt

# for each file, fill the empty field
for F in input.*.txt
do
    sed 's/\t/_/' "$F" | sort -t $'\t' -k1,1 > "${F}.2"
    join -t $'\t' -e NA -a 1 -1 1 -2 1 -o "1.1,2.2" keys.txt "${F}.2" > "${F}.3"
done

# join all files
for F in input.*.txt
do
    join -t $'\t' -1 1 -2 1  keys.txt "${F}.3" > tmp
    mv tmp keys.txt
done

# dump result
tr "_" "\t" < keys.txt
ADD COMMENTlink written 11 days ago by Pierre Lindenbaum124k

Thank you a lot. It was really helpful and it is working.

ADD REPLYlink written 8 days ago by Gene 10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1492 users visited in the last hour