Question

Coverage-Based Search In Tophat

6

Entering edit mode

11.8 years ago

jeremy ▴ 80

In Tophat, there is an option: --coverage-search. When it is on, it takes long time and a lot of memory. But I couldn't find any documentation about what it measn for "coverage-based search"? Could anyone who knows about this explain or point me to the right place? Thank you in advance.

rna-seq splicing • 9.4k views

ADD COMMENT • link updated 11.8 years ago by JC 13k • written 11.8 years ago by jeremy ▴ 80

Ram · Answer 1 · 2012-07-17

3

Entering edit mode

11.8 years ago

JC 13k

The --coverage-search is a step to define possible junctions between exons. After the initial mapping, TopHat search for "islands", from the manual:

TopHat generates its database of possible splice junctions from three sources of evidence. The first source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron.

This steps takes a lot of time/memory because you are looking for peak signals in a sparse space.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 11.8 years ago by JC 13k

0

Entering edit mode

Thanks for the answer. I understood that Tophat tries to identify covery islands and then looks for possible junctions. However, if this is what is called --coverage-search, how to understand the fact that you can actually turn off --coverage-search option. That's why I was confused because I thought Tophat uses coverage search to identify juncitons. Now if we can turn it off, then this step is not necessary for identifying junctions. Then the question is what Tophat uses to identify junctions if specify --no-coverage-search? Could you comment on that? thanks.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 11.8 years ago by jeremy ▴ 80

1

Entering edit mode

the identification of new regions with the coverage is relevant only if you want to detect new splice sites in alternate transcripts or even new genes. If you only want the expression profile of the annotated genes, you can skip this step for speed.

ADD REPLY • link 11.8 years ago by JC 13k

3

Entering edit mode

Hi! The identification of new splice sites in different genes/transcripts is still possible without coverage search!

Coverage search is, according to the manual, only useful when you've got very short reads, since in this case the probability that the read will "hit" the splice junction exactly may be very low for relatively lowly expressed transcripts. Hence, you need another way of detecting splice sites, which is where coverage search comes in. To make it easier for the algorithm by using coverage search you are allowing for only the most canonical of GT-AG splice junctions. Which means that if you've got longer reads by enabling this option you lose the other types of splice junctions - GCAG and ATAC - so enabling this option for long reads or deep libraries doesn't make sense, as outlined in the manual.

ADD REPLY • link 11.3 years ago by galka8 ▴ 580