Scaffolding contigs assembled with hifi reads
1
1
Entering edit mode
3 months ago
stjerne ▴ 10

What is the current standard practice for scaffolding contigs of a de novo assembly generated with only hifi reads? Is it possible to scaffold without HiC data? Is it even necessary? I have short reads that I can use, but it sounds like the gain would be marginal, if any.

genome scaffolding hifi • 422 views
ADD COMMENT
0
Entering edit mode
3 months ago
dthorbur ★ 1.9k

Just to clarify, you have generated a new contig level assembly using only PacBio reads, and you have some short reads available too that you haven't used yet? What's the contiguity and size of your assembly at the moment? Tens, hundreds, or thousands of contigs?

You can scaffold with other sources data too. Optical mapping, linkage maps, and since you've used long reads already, highly optimized ultra-long read sequencing could be used in place of Hi-C. However, they all come with different problems in generating appropriate samples. Here is a review with some methodologies and considerations.

The question about whether scaffolding is necessary really depends on what you want to use it for. For most types of analysis, I would say a highly contiguous assembly is helpful, but not a requirement. Though it certainly impacts the quality and robustness of your findings. QTL mapping for example, if you find QTLs related to a specific factor all clustering to a single chromosome is an interesting finding, but if you only have a contig level assembly you wouldn't have identified that result.

ADD COMMENT
0
Entering edit mode

Thanks for your response. Yes, I have a draft genome that was assembled using only PacBio reads, using hifiasm. I also have short reads, but they are not from the same individual as the PacBio reads. I have no other genomic data. The current assembly is ~250Mb, N50 is about 2Mb, and has 350 contigs/scaffolds after testing out several scaffolding tools.

I posted this question because the docs for long read scaffolding tools all mention scaffolding short read assemblies with long reads, so I wasn't sure if scaffolding a long read assembly with long reads was appropriate. (newbie here)

The assembly would ideally be complete and contiguous enough to look at regulatory regions of certain sets of genes.

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6