Why does the number of sequences increase after using Hi-C to mount chromosomes?
2
0
Entering edit mode
5 weeks ago
xinguok794 • 0

I used hifiasm to assemble a human genome using HiFi and ONT data. The initial assembly produced 159 sequences. However, after using YAHS to scaffold the chromosomes with Hi-C data, the number of sequences increased to 179 in the resulting FASTA file. This seems unusual — shouldn't the number of sequences decrease after scaffolding with Hi-C data? I'd like to understand where the issue lies. I would be grateful for any advice you could provide!

YAHS hifiasm assembly Hi-C Gene • 548 views
ADD COMMENT
0
Entering edit mode
5 weeks ago
Corentin ▴ 660

Hi,

It is not unusual for assemblies to have unplaced scaffolds (sequences that could not be assigned to a chromosome). You should not only check the number of sequences, but also their lengths (eg: N50, length of the largest sequence etc...). You can use Quast to compute QC stats for your assembly.

For Hi-C you can also plot the contact map and check if you have "large squares" corresponding to your chromosomes, this could give you an idea of what is happening with your assembly. You can use JuiceBox for this.

Since you are working with human data, you have access to a reference genome, you can align your assembly against a human reference genome to check for any discrepencies (using Mummer for example).

ADD COMMENT
0
Entering edit mode
5 weeks ago
shelkmike ★ 1.7k

YaHS not only scaffolds contigs, but also splits them in places that contradict Hi-C contacts. Maybe, this is the reason. However, in my experience YaHS always reduced the number of sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 3176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6