Question: filter genome above N50 value
0
gravatar for shubhra.bhattacharya
11 weeks ago by
shubhra.bhattacharya120 wrote:

Hi everyone, I want to filter my assembly to get contigs above the N50 value only.

How can this be done.

--Thanks in advance

assembly • 156 views
ADD COMMENTlink modified 11 weeks ago by Michael Dondrup45k • written 11 weeks ago by shubhra.bhattacharya120

You realize that your new filtered assembly will have a new, higher N50 value and you are just chasing a moving target until you have one contig left?

ADD REPLYlink written 11 weeks ago by WouterDeCoster36k
0
gravatar for Michael Dondrup
11 weeks ago by
Bergen, Norway
Michael Dondrup45k wrote:

Calculate the N50 value, and extract all sequences with length >= N50 from your fasta file. The question is just, why? N50 is not a magical threshold below which contigs are not real.

ADD COMMENTlink written 11 weeks ago by Michael Dondrup45k

Hi! I have two genomes (both draft) of an organism. I have to find out which genome between these two has to be used as a reference for my downstream analysis (transcriptome and SNP profile study etcc) For this genome-genome comparison I have used approaches such as synmap, LAST and mauve but I still cannot reach a conclusion. So, that is the reason I am wanting to filter them at N50 and see where it goes. Kindly suggest any other alternatives as well if possible.

--Thanks in advance

ADD REPLYlink written 11 weeks ago by shubhra.bhattacharya120
2

Have you tried to check if you can reconcile the two assemblies to see if you can make a better combined one?

ADD REPLYlink written 11 weeks ago by genomax62k

There is "newer" pipeline for combining the two assemblies called NucMerge that you might try (https://www.biorxiv.org/content/early/2018/11/30/483701), but I would first consider C: filter genome above N50 value

ADD REPLYlink written 11 weeks ago by jean.elbers550
2

There are many:

  • remapping completeness of DNA/RNA seq
  • linkage map - linkage errors
  • BUSCO
  • estimate of contamination, e.g. by bacterial contigs
  • repeat rate, GC content, assembly size vs. expected values
  • ....
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Michael Dondrup45k

Thanks for your suggestions!

ADD REPLYlink written 11 weeks ago by shubhra.bhattacharya120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour