Hello there, I have a list.txt (big file) contains 2000 samples and 18000 coordinates (same as below file 1).
Coordinates Sample Values
chr1:110238914-110324454 SampleB 1
chr1:110238914-110324454 SampleC 3
chr1:110238914-110324454 SampleD 1
chr5:65562670-65627908 SampleD 1
chr5:65562670-65627908 SampleA 1
chr5:65562670-65627908 SampleB 4
chr5:65562670-65627908 SampleC 1
chr2:158248715-158335919 SampleB 1
chr2:158248715-158335919 SampleA 0
chr2:158248715-158335919 SampleC 1
Actually I want to make a matrix by the above file. Whereas coordinates to be as rows name and samples as columns name, then if the coordinate has related sample put the related value in the matrix, if the coordinate does not the value for the sample just put 2 in the matrix, the result should be same the below.
Coordinates SampleA SampleB SampleC SampleD
chr1:110238914-110324454 2 1 3 1
chr5:65562670-65627908 1 4 1 1
chr2:158248715-158335919 0 1 1 2
I would really appreciate it , if I can receive any scripts for linux,bash (preferably) or R to get this result?
)
Relevant post from SO - "reshape long to wide":