Question

Forum:MS Bioinformatics Project

0

Entering edit mode

6.3 years ago

Moneeb Bajwa ▴ 10

Hi, I know others have asked similar questions, but they were not particularly what I was looking for. I am Master's Bioinformatics student with a BA in Biology. I need a project idea that would last 3 months (Professor did not help). I was thinking something along the lines of getting sequencing data, trimming it, aligning it, and doing a differential expression analysis, but I don't know where to get the data for this (perhaps NCBI?). If you have other ideas for projects as well, that would be good too. I was also thinking about a project about something having to do with GWAS. My only experience in programming is what I learned in my Master's program.

Thanks for all your help

assembly sequencing alignment R SNP • 2.5k views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 6.3 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

Do you have any experience with Bash scripting? It might be helpful to specify what programming you learned in your master's program. It would help if we had some more info about the goal of your project? Are you supposed to learn something about a specific tool, or contribute something unique to the community (etc.)?

ADD REPLY • link 6.3 years ago by Andrew_MacGregor ▴ 30

0

Entering edit mode

Yes I have bash scripting experience, perl, python, R, and wish to utilize these in the project. I have statistical analysis experience with R as well.

ADD REPLY • link 6.3 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

I would just do something 'simple' like:

Obtain RNA-seq FASTQ cancer cell-line data from the SRA, such as MCF7 breast cancer cell-iines with and without treatment
Trim the reads by looking here: illumina quality trimming - FASTQC
Then determine read count abundances over your samples with Kallisto
Then read your Kallisto counts into DESeq2 by following Michael Love's great tutorial from Here or Here
Then do a simple differential expression analysis with DESeq2

Do that and then Bob will be your uncle

Step 1 is Web browser; 2 and 3 are shell / BASH; 4 and 5 are R Programming Language

Of course, please come back here for help if needed.

Kevin

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you!! I was thinking of doing the steps you mentioned above multiple times for various datasets to test some sort of hypothesis (because I need this project to last the 3 months mentioned in the question). Is there some sort of hypothesis I could test by comparing multiple datasets in this fashion?

ADD REPLY • link 6.3 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

Let me think. Is this your major project for your MSc? - like, how would you rate the importance of it? From my perspective, a Masters student should not have to necessarily add anything new to literature; thus, merely doing a re-analysis should be sufficient.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Yeah it's not an official paper or anything like that. But I just need to figure out how to extend it for an entire semester. This is really just something that I'm doing because I could not find an internship/co-op, so my Professor let me choose this option.

ADD REPLY • link 6.3 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

Well, you could do something like download multiple cancer datasets and aim to come up with a 'pan-cancer' panel of markers (including non-coding RNAs). After you do your standard differential expression analyses to identify differentially expressed genes (DEGs), you could then do something 'cool' like refining the panel signature using lasso regression. I put some code here, which may help: A: How to exclude some of breast cancer subtypes just by looking at gene expressio

Here are some other posts of mine, which may help to give you further ideas:

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks!! But considering the read files are so large (500MB or more) wouldn't they be too big for expression analysis in R from my laptop? I have access to my school server for trimming and abundance estimates, but would the R-portion be able to be done completely on the server for differential expression analysis or would I need some to be done in R-studio?

ADD REPLY • link 6.3 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

500MB is peanuts these days... we now deal in at least gigabytes. The large projects deal with petabytes (ICGC, TCGA, 1000 Genomes).

If you have a relatively new laptop (?), 500MB should not be a problem. R should also be installed in your compute cluster environment, but that question needs to be directed to your local IT person, or directly to central IT. Installing a version of R for global use is not difficult - just depends on who the System Admin is. I've done it before whilst managing large clusters.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks again! I am actually working on my laptop from home, and I connect to the school server through bash...so would I be able to do everything regarding the R-part on vim while on the server? Maybe I could save the graphs that I need as a PDF? Do I really need to use the R software interface?

ADD REPLY • link 6.3 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

Are you worried about the plotting window not showing when you use R on the server? To get that part working, you need to set-up X Window (X11).

Are you using Windows? If 'yes', then install a program call Xming and leave it running in the background. Then, when you log-in to the server using [hopefully] PuTTY, go to: Connection > SSH > X11 and check the 'Enable X11 forwarding' checkbox, an also put into the text box the following: localhost:0

That will then transmit plotting windows from the server to your laptop.

ADD REPLY • link 6.3 years ago by Kevin Blighe 88k

score 5 · Answer 1 · 2018-04-02

Sorry but if your Professor cannot help you find a project, I recommend you find another lab to pursue your Master's thesis. As a Master's student you expect to be given full guidance as it is an essential step before heading towards a PhD program. Asking for project ideas in a forum is not gonna help. Nevertheless, if you decide to stick to the current lab and shape your own project then the GEO database (https://www.ncbi.nlm.nih.gov/geo/) or the ENCODE (https://www.encodeproject.org/) is what you are looking for. Both databases have a huge amount of sequencing data.

Good luck!