Question

Using 10X Chromium linked reads for genome gap-filling

2

Entering edit mode

4.9 years ago

daren.card ▴ 20

I have about 75x coverage of 10X Genomics Chromium data for a non-model reptile species. I've used this to produce a genome assembly using Supernova and am currently using similar coverage of Hi-C data to scaffold. I expect this scaffolding process to result in assembly gaps of various lengths beyond what is already probably present in my Supernova assembly.

I know there is lots of software for filling gaps using short Illumina reads and dedicated software for long-read data like PacBio, but I wondered if there is any software that leverages the linked Illumina reads provided by 10X Genomics to perform gap-filling? Or is there a way to extract the assemblies of linked reads from Supernova (or produce de novo) that should theoretically provide contig sequences up to the length of the input molecules, which could be used with a custom mapping pipeline to fill gaps?

Assembly assembly genome • 1.5k views

ADD COMMENT • link updated 4.9 years ago by harish ▴ 450 • written 4.9 years ago by daren.card ▴ 20

score 1 · Answer 1 · 2019-05-16

1

Entering edit mode

4.9 years ago

harish ▴ 450

I don't think there would be dedicated tool for it, but essentially all gapfillers do is just map the reads, compute the consensus and patch the gaps up assuming adequate unique flanking regions are obtained.

Since 10X reads are nothing but standard paired end libraries with 10X barcodes embedded, you can remove those barcodes from reads using scaff10X (scaff_reads) and then use gapfiller/cobbler/rails etc to fill those regions.

ADD COMMENT • link 4.9 years ago by harish ▴ 450

0

Entering edit mode

Thanks for the reply. Makes sense. I was just hoping it would be possible to somehow leverage the local assemblies from the linked reads. Seems like it would work better to map and extend using >10 kb "reads" vs. just 150 bp ones.

ADD REPLY • link 4.9 years ago by daren.card ▴ 20

0

Entering edit mode

I don't know about the fidelity of what I'm about to suggest, but maybe breaking the scaffold sequences at a specific amount of "N"s and using these pseudo-contigs might probably help.

ADD REPLY • link 4.9 years ago by harish ▴ 450