Forum:What to put in my Github
1
2
Entering edit mode
29 days ago
g_virdzek ▴ 20

I'm a PhD student working on a bioinformatics project. I would like to create a Github account and start adding some scripts and analyses to demonstrate my skills to future potential employers. However, I am not sure what data to use because I am not allowed to release my scripts or data. Do you think I could just do some analyses on some publicly available data sets published in papers, I would clearly state that it's not my data and credit the creators. I do not mean data like TCGA but data from scientific papers published by other groups. Thanks in advance for any input.

Github • 469 views
4
Entering edit mode

Since this is an open-ended question I am changing it to Forum.

I am not allowed to release my scripts or data

Unless you are doing something super secret releasing general code (something you wrote de novo) should still be be possible right? You can obfuscate paths/file names etc. You don't need to release everything you ever wrote but there must be something that demonstrates your capabilities.

2
Entering edit mode

Depending on contractual obligations (writing as an industry employee) sharing code can be a breach of confidentially agreements. Personally, on my public github profile I must take great care that my toy projects share no code with the projects I work on. While working at a software company, releasing code could lead to legal actions from my employer.

1
Entering edit mode

OP has said that they are a PhD student but if they are working on an industry sponsored project then only way to show programming proficiency would be to post code that they may have written for unrelated personal projects. Some companies may have restrictions even on what you can work on (on your personal time) so posting code on GitHub may not be an option.

0
Entering edit mode

Sure, I understand OP is in a very different situation. In fact my intention was rather to raise awareness that such rather unusual restrictions for a PhD should be mentioned in a written form, otherwise I guess they could very likely be legally void. That is, if you care to fight with your PhD supervisor over such terms.

Granted, that intention wasn't very obvious in the hasty two liner above.

2
Entering edit mode
29 days ago

Yes, I think analyzing publicly available data is perfectly fine. Trying to reproduce their findings (or find something they perhaps missed) would both be great. Providing context in the form of a jupyter or Rmd notebook while analyzing to explain your rationale and findings would be a good idea. Make sure your code is documented and clear.

Alternatively, if you can generalize your code into simple R or python (or whatever language) packages such that they can be run on whatever data, that may be worth considering, as it showcases a variety of skills.

1
Entering edit mode

Seconding that. I strongly encourage everyone to wrap commonly used code and functions as simple packages and put them on Github. This allows not only version control but also keeps track of the required packages plus makes it trivial to port and install your code on every machine you like rather than copying scripts from A to B.

2
Entering edit mode

I prefer source('~/MyRScripts/myscript.R') to creating a package and loading it every time. Of course, one should only store reusable code and not data objects in these scripts.