Question: Tophat2 and Bowtie2 : determining appropriate values for mate-inner-dist and mate-std-dev
gravatar for irritable_phd_syndrom
2.3 years ago by
irritable_phd_syndrom60 wrote:

I am trying to tune my tophat2 runs following the example of this blog. They suggest trying different values for the --mate-inner-dist and --mate-std-dev.

To get an idea as to the numbers I should use, I ran tophat using :

tophat2 -p 20 -o out/ -a 8 --library-type fr-firststrand --no-coverage-search /path/to/Bowtie2Index/Mus_musculus.GRCm38 R1.fastq.gz R2.fastq.gz

Looking at the bam alignment file, I selected all the Read1 reads (SAM FLAG bit 0x40 set) which were properly aligned (bit 0x2 set) and were not secondary alignments (bit 0x100 _not_ set). The mean of abs(TLEN) is 3400 and the standard deviation of abs(TLEN) is 369034. Double checking the file, there indeed large values for TLEN (e.g. there are values in the 100,000s and millions).

Picking these as my values for the mate-inner-dist and mate-std-dev seems like a very poor decision. Clearly there is a tail of reads with very large template lengths that are skewing the distribution (which is _not_ Gaussian). So I have a few of questions:

  1. Is the template length (TLEN) from tophat computed from the genome or transcriptome?
  2. What would the best way to select the mate-inner-dist?
  3. Looking at the run.log it appears that bowtie2 uses the default value for --maxins=500 (ie. it is not explicitly changed). Does this matter since I have values of TLEN > 500?
rna-seq alignment • 943 views
ADD COMMENTlink modified 2.2 years ago by mra818720 • written 2.3 years ago by irritable_phd_syndrom60

It may be better to switch to HISAT2 (if you want to stay in the same family) at this time.

You can estimate the insert size using BBMap as described here: Estimating Mean Inner Distance

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax64k

The "best" way would be to look at the Bioanalyzer trace, or whatever your sequencing people used, when checking library size distribution after library prep. The fragment length mean and SD can be estimated from that curve, after subtracting the adapter lengths.

ADD REPLYlink written 2.2 years ago by apa@stowers400
gravatar for mra8187
2.2 years ago by
mra818720 wrote:

you can find out insert size and STD with using BWA :

./bwa index -a bwtsw /path to reference genome

./bwa index /path to reference genome

./bwa aln -o 0 -e 0 /path to bwa indexed reference genome /path to trimmed.file1.fastq.gz > trimmed.file1.fastq.gz"_to_"reference genome.sai

./bwa aln -o 0 -e 0 /path to bwa indexed reference genome /path to trimmed.file2.fastq.gz > trimmed.file2.fastq.gz"_to_"reference genome.sai

./bwa sampe /path to bwa indexed reference genome /path to trimmed.file1.fastq.g "_to_"reference genome.sai /path to trimmed.file2.fastq.gz"_to_"reference genome.sai /path to trimmed.file1.fastq.gz /path to trimmed.file2.fastq.gz > reads_pair_to_ref.sam

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by mra818720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 774 users visited in the last hour