Hi I have a file named gene.tsv and there are 100s of folders - each containing these files. The file format is :
Gene ID` `Gene Name` Reference Strand Start End Coverage FPKM TPM
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSG00000187961.13 KLHL17 chr1 + 960587 965715 4.71 2.22 5.03
2 ENSG00000187583.10 PLEKHN1 chr1 + 966497 975865 3.67 2.60 5.89
3 ENSG00000187642.9 PERM1 chr1 - 975204 982093 1.09 0.445 1.01
4 ENSG00000187634.11 SAMD11 chr1 + 923928 944581 6.73 5.57 12.6
5 ENSG00000188976.10 NOC2L chr1 - 944204 959290 67.4 26.9 61.0
6 ENSG00000188290.10 HES4 chr1 - 998962 1000172 27.2 13.1 29.6
The last coloumn contains TPM values, I want to make a matrix from the last coloumn from all the samples (i.e. all the different gene.tsv files) which are in different folder (named by sample names)
The problem is each gene.tsv file contains different number of rows For example, 1st gene.tsv contains 19645 rows, 2nd contains 19688 rows
The output should look like this: TPM values for each gene per sample..
Sample1 Sample2 Sample3 Sample4 Sample5 Samle6
A1BG 211.653339 91.35832 118.5056 227.7529 60.53333 122.0699
A1CF 0.000000 0.00000 0.0000 0.0000 0.00000 0.0000
A2M 21748.389142 103099.68587 18077.6432 91905.5829 71344.22858 34262.9726
A2ML1 432.546595 3552.04679 0.0000 0.0000 13.67998 2055.6870
A3GALT2 1.413336 0.00000 0.0000 0.0000 0.00000 0.0000
A4GALT 731.331278 691.09973 922.3733 1083.1338 631.42933 488.1566
Can you please let me know how to make a matrix from the last coloumn from every file if the row number is different? If you can post it in R language. I have tried ways but it is not working when the rows are different. Your answer will be much appreciated! Thank you