Question: De Novo Assembly With 1X Coverage?
gravatar for Biomonika (Noolean)
9.0 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

Hi guys,

I am working with 454 reads with 1x coverage. In order to find centromeric repetition, I was thinking about assembling reads and then trying to recognize repetitive regions. However, I am not sure how much sense it makes to assembly reads with such coverage.

Any help will be appreciated. Thanks.

EDIT: Thank you very much for the answers. I decided to reformulate problem. I have 454 reads with 1x coverage from Cardamine rivularis and my aim is to find centromeric repetition. Therefore there are two basic approches - looking for repetition in raw reads and looking for repetition in assembly (use of this is ambiguous with this coverage).

I have also had idea to use centromeric repeats from other species as reference sequences and try to map reads on them. In case that repeat from Cardamine would be similiar (what is not probable:), this could work.

I will postpone checking answer untill I will try:) Comments and discussion still more than welcome and thanks a lot for answers and comments discussed so far.

assembly clustering • 2.6k views
ADD COMMENTlink modified 9.0 years ago by lexnederbragt1.2k • written 9.0 years ago by Biomonika (Noolean)3.1k

At 1X coverage, working with raw reads is probably better than assembly. At least for human, centromeres are mostly imperfect satellite repeats. I do not think at 1X you can get more from assembly.

ADD REPLYlink written 9.0 years ago by lh332k

De novo assembly is not the right method for detecting highly repetitive regions. Try clustering of sequence reads instead, e.g. cd-hit, or cd-hit 454 see the link below.

ADD REPLYlink written 9.0 years ago by Michael Dondrup48k

ADD REPLYlink modified 14 months ago by Ram32k • written 9.0 years ago by Michael Dondrup48k

Here is a relevant paper where k-mer frequency spectra from 454 reads was used to characterize centromeric regions in rice.

ADD REPLYlink written 9.0 years ago by SES8.4k

Which species do have? Did you consider getting the centromeric repetitions directly from the reads?

ADD REPLYlink written 9.0 years ago by Christof Winter990

It is from Cardamine rivularis. The thing is that centromeric repeats use to be quite long (180bp in Arabidopsis) and therefore hard to find in 454 data.

ADD REPLYlink written 9.0 years ago by Biomonika (Noolean)3.1k

Have you tried running RepeatMasker on your reads?

ADD REPLYlink written 9.0 years ago by Jeremy Leipzig19k

The length of 454 reads has reached 300bp for several years. Your (alpha?) satellite unit should be contained in one read.

ADD REPLYlink written 9.0 years ago by lh332k
gravatar for SES
9.0 years ago by
Vancouver, BC
SES8.4k wrote:

I'm guessing you study maize (by your picture), and the centromeric tandem repeats in this species consist of monomers of ~156 bp, which is much shorter than your 454 reads. I agree with others that searching the reads would be a better approach. Also, active maize centromeres contain a specific clade of retrotransposons (appropriately called centromeric retrotransposons in maize, or CRM elements) which you can easily find in repeat databases or in GenBank for searching your reads.

ADD COMMENTlink written 9.0 years ago by SES8.4k

It looks like I guessed wrong about your study system, but Cardamine is a cool genus to study chromosome evolution. RepeatMasker may be slow if you have a lot of reads, but is good to try for identifying known repeats, as Jeremy suggested. Also, give TRF ( a try as you might be more likely to identify centromeric repeats specific to your species.

ADD REPLYlink written 9.0 years ago by SES8.4k
gravatar for lexnederbragt
9.0 years ago by
Oslo, Norway
lexnederbragt1.2k wrote:

It could actually be worth your while trying to do an assembly with these reads. At that low coverage, any alignment between reads the assembler can find indicates that the reads are from repeats. So, any of the (probably few) contigs produced by the assembler is a candidate repeat, thereby reducing the search space drastically relative to searching all the reads. If you intend to use newbler from 454, I recommend increased stringency settings for the overlaps to prevent newbler from spending a lot of time looking for spurious overlaps (minimum overlap length -ml 60, or 80, or even higher, minimum overlap identity 98%, for example).

ADD COMMENTlink written 9.0 years ago by lexnederbragt1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1034 users visited in the last hour