Can Tophat2 Align a Book?
3
5
Entering edit mode
8.9 years ago
tiago211287 ★ 1.4k

If I made a fake fasta with 10x the material of a book, randomly cut and spliced, and use this to align a Book, could tophat2 reconstruct the book? Or it only work with with ATCG letters?

Just a question that came out in my mind today.

Tophat2 • 2.3k views
ADD COMMENT
1
Entering edit mode

This problem seem better suited for a de novo assembly program like Velvet, SGA, ALLPATHS-LG, ABySS, SOAPdenovo, etc. As suggested by Brian you may want to re-encode the book as ACGT.

ADD REPLY
3
Entering edit mode
8.9 years ago

You would need to re-encode the book as ACGT. For example, 1 ASCII character is 8 bits, corresponding to 4 nucleotides if you use the simplest possible encoding (rather than trying to pack into 7 or 6.5 bits, or whatever). Thus for an ASCII-formatted text file of the book, the encoded book would be 4x as long, but the mapping would work fine.

You MIGHT be able to map to the raw book using some protein aligners, as those allow more symbols.

As Istvan said, though, you'd need to use an assembler to reconstruct the book, not an aligner.

ADD COMMENT
0
Entering edit mode

Good tip, could you explain how can I re-encode smth? Could I actually make a string of text become just ATCG and after the alignment revert the process? Or the re-encoding is irreversible?

ADD REPLY
0
Entering edit mode

The encoding is reversible.

https://en.wikipedia.org/wiki/ASCII

For example, "Hi" -> 01001000 01101001 -> CAGA CGGC

where 00 -> A, 01 -> C, 10 -> G, 11 -> T

ADD REPLY
3
Entering edit mode
8.9 years ago

In principle yes, in practice it might not work out that well. TopHat is built to recognize splicing dinucleotides that show the most likely splice locations. Then everything depends on the content and the length of the pieces.

But of course to do that you would need to have a book to align against, so "reconstructing" the book does not make sense here, you already need to have the book to align with TopHat.

ADD COMMENT
3
Entering edit mode
8.9 years ago
SES 8.6k

This is an interesting question. Instead of using a tool like Tophat, I would suggest trying Vmatch because it allows you to define any alphabet you like (not just DNA/RNA or protein). You would define the alphabet with the mkvtree program when you create the index (suffix tree) and then you could map your words or sentences to the book with vmatch. I imagine this approach would require less work than recoding your data or modifying an existing DNA aligner. It would be easy enough to try this, and I'm sure someone has, but I can't say I've done this myself.

ADD COMMENT
0
Entering edit mode

Very Interesting tool. I will try this just for fun and for educational purposes.

ADD REPLY

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6