Question: optimizing the -r/--mate-inner-dist in Tophat2
0
gravatar for candida.vaz
15 months ago by
Singapore
candida.vaz0 wrote:

Hello everyone,

I have Illumina paired end (2*150bp) reads for 30 samples that I mapped to the reference genome using Tophat2. The sequencing providers gave me the following information:

  • The fragments used for sequencing range from 200 to 400bp
  • The average size is around 303 to 330bp. This includes the adapters.
  • The paired end reads (2*150bp) include adapter sequences

I did the first round of mapping (after trimming the adapters: Trimmomatic) using --mate-inner-dist 100 --mate-std-dev 40. I got good concordant pair alignment rate ranging from 75 to 80%.

I now want to optimize the Tophat2 results. So I used RSeQC on the accepted_hits.bam files.

And got these results (for Sample1):

inner_distance.py -i Sample1-accepted_hits.bam -o Sample1-output -r genes.bed

Get exon regions from genes.bed

Load BAM file ... Total read pairs used 1000000

Name Mean Median sd

Sample1-output -98.6345633476147 -113 51.2257769893221

null device

I get very similar values for all the samples.

My questions:-

  1. Is it fine to use "accepted_hits.bam" of my first tophat2 for running RSeQC?
  2. Mean values are negative values does this mean that the reads are overlapping? And should I use these mean and std dev values for my next tophat2 run? --mate-inner-dist -99 --mate-std-dev 51

I tried running tophat2 on few samples using the parameters generated by RSeQC and do not find any significant difference in "overall read mapping rate" and "concordant pair alignment rate". Though there was an increase in the "number of Aligned pairs"

So initially I had got 25721951 aligned pairs and then after running Tophat2 with the new parameters it increased to 25722849.

Thanks in advance!

Best regards,

Candida

rna-seq tophat2 • 518 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by candida.vaz0

Not the answer you are looking for, but you should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 15 months ago by WouterDeCoster36k

I have also used STAR aligner for mapping. Thanks for the information, will look up the publication you suggested.

ADD REPLYlink written 15 months ago by candida.vaz0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1380 users visited in the last hour