Question: Does amovalidate pipeline is suited for long read assembly from PacBio ?
0
gravatar for Roxane Boyer
7 months ago by
Roxane Boyer240
France/Marseille/IBDM
Roxane Boyer240 wrote:

Hello everyone !

I'm looking for a way to asses the quality my long PacBio reads assembly. I already tried Quast, which gives you all the classical metrics for genome assembly assessment. I also tried Busco2, which give you a preview of the duplication level within a set of particular genes (exemple : arthropoda, insecta...), but I need something more specific.

I my long read assembly, as the studied species is higly polymorphic, I already spotted on particular gene local duplication within the assembly. For example, a gene, with the following exon structure : 1 2 3 4 in the closest species D.melanogaster, will be duplicated with a false exon structure : 1 3 4 2 3 4.

I've heard about amosvalidate pipeline, which identify "suspicious" regions. But I'm not sure that this pipeline is well suited for my data.

Do you have any idea of what tool or method I can use in order to detect this particular kind of events ?

Thanks for your advices !

Cheers,

Roxane

ADD COMMENTlink written 7 months ago by Roxane Boyer240

I dont understand very well what is you`re looking for, but if you want to found polymorphic regions you can start aligning your reads to the assembly (Bowtie2) and searching for low coverage regions within the contigs (IGV-viewer). You can also predict genes on your assembly and load it in IGV (as GTF file) for more visual inspection, and/or count reads mapped on each gene with HT-SEQ.

ADD REPLYlink modified 7 months ago • written 7 months ago by Buffo470

Hi Calejas,

As I'm doing long reads assembly from PacBio, there is no (or at least less than Illumina sequencing) low coverage regions. My problem is that I want to detect, within my assembly, smale-scale rearrangments that are chimeric. The specific example I used was verified : in the assembly, the exon structure for a gene was false, they were duplicated regions that were inserted close to each others. It's these kind of errors within my assembly I want to detect.

I thought that amosvalidate will detect this kind of issue but I'm not sure if it's appropriated for PacBio long reads, as it was designed for Illumina reads at first.

ADD REPLYlink written 7 months ago by Roxane Boyer240

Oh, Im sorry thats true, In my opinion (i´m not an expert) you have to define whats is the origin of the "complexity" that causes chimeric assemblies; %GC enrichment? kmers overexpressed? And also, what do you have been sequenced? Entire genome? metagenome? or amplicons? If you have been sequenced amplicons I think that analysis can be more easy, Have you tried find_motif from biopieces (assembly)? or fastqc (it includes some interesting graphs (reads)?

ADD REPLYlink written 7 months ago by Buffo470

Yeah, that's true that it could be usefull with we known why theses regions in particular were duplicated. I've sequenced the whole genome, my project is a de novo assembly of a highly dimorphic drosophila : D.suzukii.

Maybe I can try to analyze some particular pieces of my assembly that I think are werid with FastQC indeed, maybe we can extract information from their... I hope so !

ADD REPLYlink written 7 months ago by Roxane Boyer240

Oh! What about comparing your assembly to a very close reference genome? Use nucmer, it think it will be usefull.

ADD REPLYlink written 7 months ago by Buffo470

Maybe I can try that too ! My main problem is that the reference I will use, drosophila melanogatser, has a smaller genome than suzukii...

I'll see where it goes ! Thanks for the advices

ADD REPLYlink written 7 months ago by Roxane Boyer240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1314 users visited in the last hour