Question

Finding Repeat Genes In Unused Reads

1

Entering edit mode

12.4 years ago

Lee Katz ★ 3.1k

Hi, I would like to find the sequence of a repeat gene in my WGS reads. I have raw reads from both 454 and Illumina, and I have a fasta file of several alleles/variants of a given gene. I know that this gene exists in the genome from a PCR reaction/gel.

Is there a standard strategy to uncover the repeat sequence in a pseudocontig? As in, a consensus of the repeat? Has someone already done this in a software suite?

Thank you for any and all help!

repeats assembly • 2.0k views

ADD COMMENT • link 12.4 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

I put my tentative strategy as an answer but I am still wondering what others have done.

ADD REPLY • link 12.4 years ago by Lee Katz ★ 3.1k

score 1 · Answer 1 · 2011-12-20

1

Entering edit mode

12.4 years ago

Lee Katz ★ 3.1k

I have thought through it a little, and I think that my best method right now is to look at the Newbler output, especially the reads labeled as repeat, and try assembling them by themselves. Also, I might include those labeled as singletons. However, I am not sure how I would use the Illumina reads yet if at all.

So, my tentative strategy:

extract repeat reads (sfffile/sffinfo)
blast my alleles against reads to pick out relevant reads, using a small word size and liberal e value
extract those reads from the SFF file (sfffile)
assemble the relevant reads (using Minimo? Or Newbler again?)

ADD COMMENT • link 12.4 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

Seems sound to me. You could try the above but also by adding in your alleles to the assembly process (step 4).

ADD REPLY • link 12.3 years ago by Larry_Parnell 16k