How to find tandem duplications pattern in a DNA sequence
3
0
Entering edit mode
3 months ago
kumajis • 0

Hi,

I have a DNA sequence contains a tandem duplication (the sequence of duplication is unknown) for an unknown times.

Each subsequence has some mutations, suppose repeat unit is A (length is 1~8K ) then the sequence is A1A2A3....An(A1 and An possibly are not complete).

If I want to find out the duplication unit A, how should I do?

Thanks.

Repeat • 909 views
0
Entering edit mode

If you are interested in finding exact number/sequence of the repeats then using a long-read sequencing technology would be the ideal way.

0
Entering edit mode

Yes, I have some rolling amplified nanopore sequencing data and want to separate the repeated unit then generating a consensus single reads for a better accuracy.

0
Entering edit mode

some rolling amplified nanopore sequencing data

You should have included this information in the original post.

Even if ReDTandem is not supported you may be able to use it. If it is not meant to be used with long reads then that would be an exception.

Are you using a custom protocol to do this sequencing. What do you mean by "rolling amplified" data? AFAIK there is no rolling circle like sequencing in Nanopore as there is in PacBio.

0
Entering edit mode

Thanks,

ReDTandem can not be downloaded anymore since the author closed his website

“Rolling amplified” means the target DNA is amplified several time to be a tandem sequence, then separate these tandem copies after nanopore sequencing, I could take a consensus step to increase the target DNA sequence accuracy.

0
Entering edit mode

Hi,

actually, what I exactly wanted is a massive sequencing reads of rolling amplification sequence, is there any bioinformatic tools for is propose?

I find one named ReDTandem but the author have not supported it anymore.

Thanks

3
Entering edit mode
3 months ago

Tandem Repeats Finder may be of use:

1. Click on "Submit a Sequence for Analysis" and select "Basic" (if you want to adjust search parameters, pick a different option).
2. Click on the "Cut and paste sequence" radio button and paste in your sequence of interest.
3. Click "Submit sequence" and wait a few moments (~10-15s).
4. Click on the "Tandem Repeat Report" link to open a summary table of those repeats discovered within your window of interest.
5. Click on the items in the "Indices" column to get more details on sequences, periodicity, content, etc.
0
Entering edit mode

Hi,

Tandem Repeat Finder is good at finding STR, but could it pick out much longer repeat pattern like repeat duplicated genes?

Thanks

0
Entering edit mode

For repeats longer than 2k in length, you'll probably want to investigate other answers/tools.

2
Entering edit mode
3 months ago
cmdcolin ★ 1.5k

I don't have much experience in this area but you could probably look for tools ranging from CNV finder (e.g. finding a DUP overlapping a gene) to something more narrow like https://github.com/delehef/asgart or How to detect segmental duplications?

Finding DUP CNVs (e.g. increased read coverage overlapping a gene) is likely a great first step, as it would be one of the more obvious signals that you can pick up, but more hidden patterns could be revealed by a specific segmental duplication finder

0
Entering edit mode

Thanks, but Dup or CNV analysis tools seem too complexed for my goal.

1
Entering edit mode
3 months ago
colindaven ★ 3.0k

If it's just contig analysis (by that I mean, you're looking at single sequences, and don't want to massively scale), the simplest approach is probably doing a dotplot or even blast2sequences on NCBI Blast with graphical output.

Compare the alignment patterns to find hints about repetition/ duplication. You can play with parameters to get eg 80% or 95% identity alignments.

0
Entering edit mode

this is probably a better answer than mine if it is just sequence vs sequence comparison!

0
Entering edit mode

Hi,

actually, what I exactly wanted is a massive sequencing reads of rolling amplification sequence, is there any bioinformatic tools for is propose?

I find one named ReDTandem but the author have not supported it anymore.

Thanks