I have an internal data for NCI-h660 file with 8m mapped pairs (HiSeq, 50bp paired end data) and I have an external dataset (4m mapped pairs, 50 bp paired end generated on GAII).
Questions: 1. I observe TMPRSS2-ERG fusion with external dataset, not with internal data from HiSeq. What could be the reasons? I use tophat2 fusion with same parameters for both the datasets.
How can I investigate the FASTQ file to see if this fusion is present. The sequence of ERG-TMPRSS2 fusion is as mentioned here: http://info.gersteinlab.org/images/c/cc/FusionSeq_results.jpg
Does this mean we need more data generated internally to find the same fusion? I use the following possible thresholds that are the minimum possible:
tophat-fusion-post -p $np --skip-read-dist --num-fusion-reads 1 --num-fusion-pairs 1 --num-fusion-both 2 $index
Any help will be greatly appreciated!! Thanks.