Question: Does amovalidate pipeline is suited for long read assembly from PacBio ?
gravatar for Roxane Boyer
14 months ago by
Roxane Boyer460
Roxane Boyer460 wrote:

Hello everyone !

I'm looking for a way to asses the quality my long PacBio reads assembly. I already tried Quast, which gives you all the classical metrics for genome assembly assessment. I also tried Busco2, which give you a preview of the duplication level within a set of particular genes (exemple : arthropoda, insecta...), but I need something more specific.

I my long read assembly, as the studied species is higly polymorphic, I already spotted on particular gene local duplication within the assembly. For example, a gene, with the following exon structure : 1 2 3 4 in the closest species D.melanogaster, will be duplicated with a false exon structure : 1 3 4 2 3 4.

I've heard about amosvalidate pipeline, which identify "suspicious" regions. But I'm not sure that this pipeline is well suited for my data.

Do you have any idea of what tool or method I can use in order to detect this particular kind of events ?

Thanks for your advices !



ADD COMMENTlink written 14 months ago by Roxane Boyer460

I dont understand very well what is you`re looking for, but if you want to found polymorphic regions you can start aligning your reads to the assembly (Bowtie2) and searching for low coverage regions within the contigs (IGV-viewer). You can also predict genes on your assembly and load it in IGV (as GTF file) for more visual inspection, and/or count reads mapped on each gene with HT-SEQ.

ADD REPLYlink modified 14 months ago • written 14 months ago by Buffo810

Hi Calejas,

As I'm doing long reads assembly from PacBio, there is no (or at least less than Illumina sequencing) low coverage regions. My problem is that I want to detect, within my assembly, smale-scale rearrangments that are chimeric. The specific example I used was verified : in the assembly, the exon structure for a gene was false, they were duplicated regions that were inserted close to each others. It's these kind of errors within my assembly I want to detect.

I thought that amosvalidate will detect this kind of issue but I'm not sure if it's appropriated for PacBio long reads, as it was designed for Illumina reads at first.

ADD REPLYlink written 14 months ago by Roxane Boyer460

Oh, Im sorry thats true, In my opinion (i´m not an expert) you have to define whats is the origin of the "complexity" that causes chimeric assemblies; %GC enrichment? kmers overexpressed? And also, what do you have been sequenced? Entire genome? metagenome? or amplicons? If you have been sequenced amplicons I think that analysis can be more easy, Have you tried find_motif from biopieces (assembly)? or fastqc (it includes some interesting graphs (reads)?

ADD REPLYlink written 14 months ago by Buffo810

Yeah, that's true that it could be usefull with we known why theses regions in particular were duplicated. I've sequenced the whole genome, my project is a de novo assembly of a highly dimorphic drosophila : D.suzukii.

Maybe I can try to analyze some particular pieces of my assembly that I think are werid with FastQC indeed, maybe we can extract information from their... I hope so !

ADD REPLYlink written 14 months ago by Roxane Boyer460

Oh! What about comparing your assembly to a very close reference genome? Use nucmer, it think it will be usefull.

ADD REPLYlink written 14 months ago by Buffo810

Maybe I can try that too ! My main problem is that the reference I will use, drosophila melanogatser, has a smaller genome than suzukii...

I'll see where it goes ! Thanks for the advices

ADD REPLYlink written 14 months ago by Roxane Boyer460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 663 users visited in the last hour