Question

Identify informative markers from two in silico digests?

0

Entering edit mode

9.2 years ago

J.R. • 0

Hi all,

I'm trying to analyze fragments generated from in silico digests of two draft genomes to determine how many of the fragments differ between the two species. For input, I have two unordered lists of ~5,000 ~300bp fragments (one from each species); I would like to determine how many of them are identical, how many differ by 1 base, 2, 3, etc., and how many don't have a corresponding fragment in the other genome.

Is this possible? What approach should I take? I can't seem to figure out if pairwise aligners will handle unordered lists like this, or if multiple alignment is what I need, or if I just want to map both sets back to one of the reference genomes. I'm not very experienced at this but I'd like to learn.

Thanks,

Joanna

RAD-seq in silico alignment SNP DNA • 1.9k views

ADD COMMENT • link updated 9.2 years ago by Biomonika (Noolean) 3.2k • written 9.2 years ago by J.R. • 0

score 1 · Answer 1 · 2015-02-11

1

Entering edit mode

9.2 years ago

Biomonika (Noolean) 3.2k

I would try clustering them as a first quick shot, so I would first make sure that the header names are distinct in these two groups, then used for example CD-HIT (http://weizhongli-lab.org/cd-hit/) and parsed the clusters I would get (proportion of sequences in group 1 versus group 2 in each cluster). Also, sizes of the clusters would be interesting to look at. (I guess that if clustering worked well, I would sort clusters by length and then plot proportions as very narrow barplots with two colors representing groups. But that's very personal suggestion :D)

ADD COMMENT • link 9.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Thank you! That ended up working really well, and the output was super-easy to parse. Thanks so much!

ADD REPLY • link 9.2 years ago by J.R. • 0

0

Entering edit mode

Glad to here that! Please remember to upvote my (and any other) answers that you find helpful ;-)