Question: filter genome above N50 value
0
gravatar for shubhra.bhattacharya
8 months ago by
shubhra.bhattacharya120 wrote:

Hi everyone, I want to filter my assembly to get contigs above the N50 value only.

How can this be done.

--Thanks in advance

assembly • 293 views
ADD COMMENTlink modified 8 months ago by Michael Dondrup46k • written 8 months ago by shubhra.bhattacharya120

You realize that your new filtered assembly will have a new, higher N50 value and you are just chasing a moving target until you have one contig left?

ADD REPLYlink written 8 months ago by WouterDeCoster40k
0
gravatar for Michael Dondrup
8 months ago by
Bergen, Norway
Michael Dondrup46k wrote:

Calculate the N50 value, and extract all sequences with length >= N50 from your fasta file. The question is just, why? N50 is not a magical threshold below which contigs are not real.

ADD COMMENTlink written 8 months ago by Michael Dondrup46k

Hi! I have two genomes (both draft) of an organism. I have to find out which genome between these two has to be used as a reference for my downstream analysis (transcriptome and SNP profile study etcc) For this genome-genome comparison I have used approaches such as synmap, LAST and mauve but I still cannot reach a conclusion. So, that is the reason I am wanting to filter them at N50 and see where it goes. Kindly suggest any other alternatives as well if possible.

--Thanks in advance

ADD REPLYlink written 8 months ago by shubhra.bhattacharya120
2

Have you tried to check if you can reconcile the two assemblies to see if you can make a better combined one?

ADD REPLYlink written 8 months ago by genomax70k

There is "newer" pipeline for combining the two assemblies called NucMerge that you might try (https://www.biorxiv.org/content/early/2018/11/30/483701), but I would first consider C: filter genome above N50 value

ADD REPLYlink written 8 months ago by jean.elbers1.2k
2

There are many:

  • remapping completeness of DNA/RNA seq
  • linkage map - linkage errors
  • BUSCO
  • estimate of contamination, e.g. by bacterial contigs
  • repeat rate, GC content, assembly size vs. expected values
  • ....
ADD REPLYlink modified 8 months ago • written 8 months ago by Michael Dondrup46k

Thanks for your suggestions!

ADD REPLYlink written 8 months ago by shubhra.bhattacharya120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1103 users visited in the last hour