Question

Identifying shared exact sequence variants between two datasets

0

Entering edit mode

5.3 years ago

adam.sorbie • 0

I have two microbiome datasets which are sequenced from different sites within the environment. I would to like to identify amplicon sequence variants inferred using DADA2 which are shared between the two datasets and also those which are unique. The problem is i'm not 100% sure of the best way to do this. So far I have tried megablast and just looking at exact matches between the two datasets but perhaps there is a better way of doing this, for example another alignment tool or alignment free method but unfortunately I don't have the experience to be able to decide on the best method. Does anyone with more experience of working with microbiome data know of a better way?

alignment dada2 16s microbiome NGS • 1.5k views

ADD COMMENT • link updated 5.3 years ago by Charles Yin ▴ 180 • written 5.3 years ago by adam.sorbie • 0

0

Entering edit mode

"amplicon sequence variants " - you mean the common and different variants between two datasets ? Does your organism has reference genome available ?

ADD REPLY • link 5.3 years ago by GouthamAtla 12k

0

Entering edit mode

The term is just what dada2 use to refer to each individual 16S rRNA sequence representing a "species" of bacteria, they are not variants in the genetics sense. Essentially yes, I wish to find the common and differing sequences between the two datasets.

ADD REPLY • link 5.3 years ago by adam.sorbie • 0

0

Entering edit mode

5.3 years ago

Charles Yin ▴ 180

You may use sliding window approach to align windowed sequence from one genome onto the other genome.

ADD COMMENT • link 5.3 years ago by Charles Yin ▴ 180

1

Entering edit mode

OP data is amplicon, probably 16S.

ADD REPLY • link 5.3 years ago by h.mon 35k

score 2 · Accepted Answer · 2019-01-12

2

Entering edit mode

5.3 years ago

h.mon 35k

You can use cd-hit-est-2d for this task. The VSEARCH wiki also has an example on how to use it to (among other things) compare datasets.

ADD COMMENT • link 5.3 years ago by h.mon 35k