This is a duplicate of a question I have posted on stack overflow but I think perhaps this community is better informed about possible tools that can solve it.
Question on stack overflow:
So the issue is:
I am looking for a good way to visualise overlaps of genes found in multiple tests. I would like to check for overlap between multiple files containing lists of genes and output a table that shows which files contain which genes in a specific way (example output further below).
I have multiple text files with lists of genes. One gene per line. Files range from approximately 30-100 rows. To be as clear as possible I will show 4 example files that I have shortened for space.
NRG3 FOXP3 SHH2 ROBO1 PPP3CA
I would like a way to take these files and create an output table that takes all the genes that are in the files and prints them as the rows of the first column (sorted alphabetically with each gene appearing only once). Then the following columns each represent an input file. If a gene appears in a file there will be an 'x' (or some arbitrary marker) to represent this in the relevant column. This would provide an easy way to visualise which genes appear in multiple files. Like this:
File1 File2 File3 File4 FOXP3 X NRG3 X X X PPP3CA X ROBO1 X X X SHH2 X X
It would be even more useful if, instead of an 'x' to represent if a gene appears in a file, this was shown in a heatmap color-gradient style way, so when a gene is found in only one file the relevant cell is shaded a light yellow, whereas if it appears in all files the cells are shaded a dark red. Are there R packages that exist to do this? However this is not essential, I just think it would be cool and improve the clarity of the visualisation.
I would appreciate any advice on how to go about doing this, especially if there are existing packages in R I am unaware about that do this already. Let me know how I can be more clear in explaining this problem.
Thank you for your help