Question

Gene Expression for mRNA sequencing tutorial

0

Entering edit mode

8.7 years ago

irritable_phd_syndrome ▴ 130

Hello,

I am a physicist by training, and I am very comfortable programming and working in a UNIX environment. I recently took a programming / algorithm development job related to biology / genetics / bioinformatics, which are fields where that I am very new to. I have a set of mRNA data (in fastq format) that was taken by a sequencer. I am developing a pipeline, looking at relative gene expression.

To test my pipeline, I am trying to analyze an old dataset. The problem is that the input data is several GB in size and it takes 7+hours to run. This makes it challenging to learn how to use the tophat and work with the data.

After aligning the sequence using tophat v1.4.1, against ucsc mm10 (downloaded from https://support.illumina.com/sequencing/sequencing_software/igenome.html), and comparing it to the already analyzed data, it appears that roughly 15% of the lines in the output data are different from the originally analyzed data file. This comparison was done by using samtools v0.1.18 to write the BAM files (output from tophat) to text files and then using UNIX diff to compare. Unfortunately, the person who did this original analysis is incommunicado.

QUESTION : Are there any good tutorials using tophat to do sequence alignment, as well as, analyzing sequencing data for relative gene expression? Toy problems would be great.

tophat RNA-Seq sequencing samtools • 1.9k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by irritable_phd_syndrome ▴ 130

Ram · Answer 1 · 2015-08-18

For gene expression analysis, I'd recommend you look at HTSeq_Count, then DESeq2 - Both of these are very well documented. (That's post alignment, so from BAM files).

As for alignment using tophat, I suggest you have a read of the manual, as it's full of information on the switches and example commands. There may have been trimming, working out the mate inner distance / standard deviation - Lots of things can affect the alignment, and it's probably best to speak to the author if you want a like for like exact replication.

If 15% of the lines are different, then I wouldn't worry too much, continue with your analysis from that. - Although there is this tutorial from the EBI.

Ram · Answer 2 · 2015-08-18

1

Entering edit mode

8.7 years ago

GouthamAtla 12k

Informatics of RNA-Seq.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by GouthamAtla 12k

Ram · Answer 3 · 2015-08-19

1

Entering edit mode

8.7 years ago

tharveshliyakat ▴ 60

I feel this would be helpful.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by tharveshliyakat ▴ 60