Question

Is there a way to separate highly similar strains from metatranscriptomic data using HISAT2?

1

Entering edit mode

6.3 years ago

sheinsch ▴ 10

I have metatranscriptomic data for several communities. When I use HISAT2 to align the reads from some of the pairings to a single genome I typically get near 50% of the reads aligning. There are a couple cases where near 100% of the reads align to a single genome. In these cases I am assuming that the genomes are very similar. Unfortunately I only have the whole genome sequence for one of the strains.

Is there a way to limit the amount of reads aligning to the wrong genome in HISAT2? My instinct is to increase the stringency of the alignment using the --score-min option. However I would like to hear from the community in case there is a more commonly used solution.

RNA-Seq HISAT2 metatranscriptomics • 1.3k views

ADD COMMENT • link updated 6.3 years ago by colindaven 6.4k • written 6.3 years ago by sheinsch ▴ 10

0

Entering edit mode

highly similar strains

You have noted the main problem yourself. There are no magic bullets/programs here especially since one is looking at relatively small snippets of sequence that could be very similar.

There is BBSplit from BBMap suite which can allow fine grained control over how reads are binned/classified that you can try but at the end of the day limitations imposed by technology are going to prevail.

ADD REPLY • link 6.3 years ago by GenoMax 141k

score 0 · Answer 1 · 2018-01-03

First, why use HISAT2 ? Are these prokaryotes or pico-eukaryotes ? If prokaryotes I would not use a spliced aligner.

Always align to all genomes at once, including contaminants (human?). Maybe you can find a draft assembly of the new genome as well.

I once created a file like this, which was useful to some users (and still is to us). http://genomics1.mh-hannover.de/genometa/

https://docs.google.com/open?id=0B-ZVOKUcgOHRakRrb0hqSWlvT3M