Gene Expression for mRNA sequencing tutorial
3
0
Entering edit mode
8.7 years ago

Hello,

I am a physicist by training, and I am very comfortable programming and working in a UNIX environment. I recently took a programming / algorithm development job related to biology / genetics / bioinformatics, which are fields where that I am very new to. I have a set of mRNA data (in fastq format) that was taken by a sequencer. I am developing a pipeline, looking at relative gene expression.

To test my pipeline, I am trying to analyze an old dataset. The problem is that the input data is several GB in size and it takes 7+hours to run. This makes it challenging to learn how to use the tophat and work with the data.

After aligning the sequence using tophat v1.4.1, against ucsc mm10 (downloaded from https://support.illumina.com/sequencing/sequencing_software/igenome.html), and comparing it to the already analyzed data, it appears that roughly 15% of the lines in the output data are different from the originally analyzed data file. This comparison was done by using samtools v0.1.18 to write the BAM files (output from tophat) to text files and then using UNIX diff to compare. Unfortunately, the person who did this original analysis is incommunicado.

QUESTION : Are there any good tutorials using tophat to do sequence alignment, as well as, analyzing sequencing data for relative gene expression? Toy problems would be great.

tophat RNA-Seq sequencing samtools • 1.9k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

For gene expression analysis, I'd recommend you look at HTSeq_Count, then DESeq2 - Both of these are very well documented. (That's post alignment, so from BAM files).

As for alignment using tophat, I suggest you have a read of the manual, as it's full of information on the switches and example commands. There may have been trimming, working out the mate inner distance / standard deviation - Lots of things can affect the alignment, and it's probably best to speak to the author if you want a like for like exact replication.

If 15% of the lines are different, then I wouldn't worry too much, continue with your analysis from that. - Although there is this tutorial from the EBI.

ADD COMMENT
1
Entering edit mode
ADD COMMENT
1
Entering edit mode
8.7 years ago

I feel this would be helpful.

ADD COMMENT

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6