Question: How to interpret RseQC result for -r/--mate-inner-dist in Tophat2?
0
gravatar for CandiceChuDVM
3.6 years ago by
CandiceChuDVM1.9k
United States/College Station/Texas A&M University
CandiceChuDVM1.9k wrote:

Hi all,

I have been playing with options of Tophat2 with my paired-read RNA-seq data. When I was wondering how to use the Tophat2 option -r/--mate-inner-dist to optimize my alignment result, I saw the Q&A from the official website:

"The SAM output of Bowtie2 for paired reads is especially helpful as the 9th field in the SAM alignment lines should show the estimated fragment length, from which you should subtract twice the read length to get the value of the "inner distance" that can be used with the -r parameter"

So I did a Bowtie2 test run for a subset of data and got my test.sam back. However, the number in column 9 is not consistent. Here is a portion of my data after filtered out unmapped reads and unmapped mates:

0 -338 338 310 -310 150 -150 -159 159 -3786 3786 -379 379 -173 173 248 -248 260 -260 0 -128 128 164 -164 -246 246 0

I was wondering how can I deduce the value for -r/--mate-inner-dist from these wide numbers? Could anybody shed some lights on this output?

Thanks!!!

rseqc rna-seq tophat2 bowtie2 • 1.5k views
ADD COMMENTlink modified 3.4 years ago by Biostar ♦♦ 20 • written 3.6 years ago by CandiceChuDVM1.9k
1
gravatar for Devon Ryan
3.6 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Have a look at either picard's CollectInsertSizeMetrics or RSeQC. Both can give you a mean and standard deviation that you can then plug into tophat2. I expect that RSeQC is simpler, since it's output can directly be used, as opposed to Picard's, where you'll need to subtract 2*(average read length).

ADD COMMENTlink written 3.6 years ago by Devon Ryan92k
0
gravatar for CandiceChuDVM
3.6 years ago by
CandiceChuDVM1.9k
United States/College Station/Texas A&M University
CandiceChuDVM1.9k wrote:

Thanks!

I did inner_distance.py —k 35069833 -i test.sam -o output -r Genome/CanFam3.1.bed and got this back:

 Total read pairs  used 1000000    
Name    Mean               Median     sd    
output  -67.108743152891   -83  55.5160777011065
null device 
      1

Should the result with a negative value ring alarm bells? Or that simply means I should put 67 as the input for -r/--mate-inner-dist ? output.inner_distance_plot

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by CandiceChuDVM1.9k

Try -67, if it complains about the negative value then use 0.

ADD REPLYlink written 3.6 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1824 users visited in the last hour