Question: Does amovalidate pipeline is suited for long read assembly from PacBio ?
gravatar for Roxane Boyer
11 months ago by
Roxane Boyer440
Roxane Boyer440 wrote:

Hello everyone !

I'm looking for a way to asses the quality my long PacBio reads assembly. I already tried Quast, which gives you all the classical metrics for genome assembly assessment. I also tried Busco2, which give you a preview of the duplication level within a set of particular genes (exemple : arthropoda, insecta...), but I need something more specific.

I my long read assembly, as the studied species is higly polymorphic, I already spotted on particular gene local duplication within the assembly. For example, a gene, with the following exon structure : 1 2 3 4 in the closest species D.melanogaster, will be duplicated with a false exon structure : 1 3 4 2 3 4.

I've heard about amosvalidate pipeline, which identify "suspicious" regions. But I'm not sure that this pipeline is well suited for my data.

Do you have any idea of what tool or method I can use in order to detect this particular kind of events ?

Thanks for your advices !



ADD COMMENTlink written 11 months ago by Roxane Boyer440

I dont understand very well what is you`re looking for, but if you want to found polymorphic regions you can start aligning your reads to the assembly (Bowtie2) and searching for low coverage regions within the contigs (IGV-viewer). You can also predict genes on your assembly and load it in IGV (as GTF file) for more visual inspection, and/or count reads mapped on each gene with HT-SEQ.

ADD REPLYlink modified 11 months ago • written 11 months ago by Buffo590

Hi Calejas,

As I'm doing long reads assembly from PacBio, there is no (or at least less than Illumina sequencing) low coverage regions. My problem is that I want to detect, within my assembly, smale-scale rearrangments that are chimeric. The specific example I used was verified : in the assembly, the exon structure for a gene was false, they were duplicated regions that were inserted close to each others. It's these kind of errors within my assembly I want to detect.

I thought that amosvalidate will detect this kind of issue but I'm not sure if it's appropriated for PacBio long reads, as it was designed for Illumina reads at first.

ADD REPLYlink written 11 months ago by Roxane Boyer440

Oh, Im sorry thats true, In my opinion (i´m not an expert) you have to define whats is the origin of the "complexity" that causes chimeric assemblies; %GC enrichment? kmers overexpressed? And also, what do you have been sequenced? Entire genome? metagenome? or amplicons? If you have been sequenced amplicons I think that analysis can be more easy, Have you tried find_motif from biopieces (assembly)? or fastqc (it includes some interesting graphs (reads)?

ADD REPLYlink written 11 months ago by Buffo590

Yeah, that's true that it could be usefull with we known why theses regions in particular were duplicated. I've sequenced the whole genome, my project is a de novo assembly of a highly dimorphic drosophila : D.suzukii.

Maybe I can try to analyze some particular pieces of my assembly that I think are werid with FastQC indeed, maybe we can extract information from their... I hope so !

ADD REPLYlink written 11 months ago by Roxane Boyer440

Oh! What about comparing your assembly to a very close reference genome? Use nucmer, it think it will be usefull.

ADD REPLYlink written 11 months ago by Buffo590

Maybe I can try that too ! My main problem is that the reference I will use, drosophila melanogatser, has a smaller genome than suzukii...

I'll see where it goes ! Thanks for the advices

ADD REPLYlink written 11 months ago by Roxane Boyer440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 461 users visited in the last hour