Question: Toy Tophat Example, finding a single junction from synthetic data
gravatar for irritable_phd_syndrom
4.9 years ago by
irritable_phd_syndrom80 wrote:

I am trying to get a better understanding of how tophat works.  To do this I have created some toy examples.  I am looking at a splice junction between two exons in chr9 of hg38.  Exon 1 and exon 2 are at base positions 133261900 - 133262400 and 133274600 - 133276150 respectively.  Below is my synthetic data

Read 1 (R1.fastq) :  

@exon 2, line 04...




@near exon 1...





Read 2 (R2.fastq) : 

@exon 1, line 05-06...         * starts here




@near exon 1





The gene is on the opposite strand with respect to the reference and I am trying to emulate paired end reads.  The first entry of R1.fastq is part of exon 2 and first entry of R2.fastq is part of exon 1.  The result of running 

tophat2 -p 2 -o . --library-type fr-unstranded /path/to/hg38/Bowtie2Index/genome R1.fastq R2.fastq

yields 2 discordant pair alignments, but no junctions.  Replacing the first line of R2.fastq with its reverse complement yields 2 concordant pair alignments, but no junctions.  

Question : How would I modify my data set to get tophat to recognize a junction?

rna-seq tophat • 1.5k views
ADD COMMENTlink modified 4.9 years ago by Devon Ryan96k • written 4.9 years ago by irritable_phd_syndrom80
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

Tophat (and similar aligners) identify junctions by finding individual reads that are spliced. That is, the splicing needs to occur somewhere in the middle of the read. So, just modify one of the reads to be longer and overlap both exon 1 and 2.

ADD COMMENTlink written 4.9 years ago by Devon Ryan96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1683 users visited in the last hour