Question: Finding Contig Repeat Counts By Mapping Contigs To The Reference Genome
gravatar for misaghb
6.1 years ago by
United States
misaghb20 wrote:

Hi guys, I have a set of contigs of genome G (using de novo assembly by Velvet) and I also have the complete sequence of the reference genome G. I want to know the repeat count (an integer number) of each contig in the reality by mapping them to the reference genome and finding and counting exact matches.

Which tools are easier to use? At the moment I'm just interested to have a 2 column result, one column showing the contig names and the other showing an integer number which is the repeat count of that contig in the reference genome. Everything else is just a bonus. Would you please let me know which tool is better or how I can easily produce this result based on MUMmer or BLAST output?


ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by misaghb20

This question is unclear to me. What is your research question? What exactly are you try to do? What kind of data do you have: genomic, transcriptomic, etc.? Are you just trying to determine read depth at a given locus? What does "repeat count of each contig in the reality" mean -- are you identifying repeat regions within contigs? Is there a strain difference in genome "G" -- why not map sequence reads onto reference instead of contigs?

Please edit your question above. Thanks.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Josh Herr5.6k

Thanks for replying Josh. As I said the data are genomic sequences. You can completely forget about sequence reads,read mapping, and read depth. For the repeat I mean # of times the contig Ci is observed in the reference genome G. (%100 match or some threshold e.g. %98)

ADD REPLYlink written 6.1 years ago by misaghb20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2121 users visited in the last hour