Question: Gene Expression for mRNA sequencing tutorial
gravatar for irritable_phd_syndrom
4.5 years ago by
irritable_phd_syndrom70 wrote:


I am a physicist by training, and I am very comfortable programming and working in a UNIX environment.  I recently took a programming / algorithm development job related to biology / genetics / bioinformatics, which are fields where that I am very new to.  I have a set of mRNA data (in *.fastq format) that was taken by a sequencer.  I am developing a pipeline, looking at relative gene expression.  

To test my pipeline, I am trying to analyze an old data set.  The problem is that the input data is several GB in size and it takes 7+hours to run.  This makes it challenging to learn how to use the tophat and work with the data.  

After aligning the sequence using tophat v1.4.1, against ucsc mm10 (downloaded from, and comparing it to the already analyzed data, it appears that roughly 15% of the lines in the output data are different from the originally analyzed data file.  This comparison was done by using samtools v0.1.18 to write the BAM files (output from tophat) to text files and then using UNIX diff to compare.  Unfortunately, the person who did this original analysis is incommunicado. 

QUESTION : Are there any good tutorials using tophat to do sequence alignment, as well as, analyzing sequencing data for relative gene expression?   Toy problems would be great.

ADD COMMENTlink modified 4.5 years ago by tharveshliyakat60 • written 4.5 years ago by irritable_phd_syndrom70
gravatar for andrew.j.skelton73
4.5 years ago by
andrew.j.skelton735.9k wrote:

For gene expression analysis, I'd recommend you look at HTSeq_Count, then DESeq2 - Both of these are very well documented. (That's post alignment, so from BAM files).

As for alignment using tophat, I suggest you have a read of the manual, as it's full of information on the switches and example commands. There may have been trimming, working out the mate inner distance / standard deviation - Lots of things can affect the alignment, and it's probably best to speak to the author if you want a like for like exact replication.

If 15% of the lines are different, then I wouldn't worry too much, continue with your analysis from that. - Although there is this tutorial from the EBI. 

ADD COMMENTlink written 4.5 years ago by andrew.j.skelton735.9k
gravatar for geek_y
4.5 years ago by
geek_y10k wrote:

Informatics of RNA-Seq.

ADD COMMENTlink written 4.5 years ago by geek_y10k
gravatar for tharveshliyakat
4.5 years ago by
tharveshliyakat60 wrote:

I feel this would be helpful.


ADD COMMENTlink written 4.5 years ago by tharveshliyakat60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1458 users visited in the last hour