I want to filter my assembly to get contigs above the N50 value only.
How can this be done.
--Thanks in advance
You realize that your new filtered assembly will have a new, higher N50 value and you are just chasing a moving target until you have one contig left?
Calculate the N50 value, and extract all sequences with length >= N50 from your fasta file. The question is just, why? N50 is not a magical threshold below which contigs are not real.
I have two genomes (both draft) of an organism.
I have to find out which genome between these two has to be used as a reference for my downstream analysis (transcriptome and SNP profile study etcc)
For this genome-genome comparison I have used approaches such as synmap, LAST and mauve but I still cannot reach a conclusion.
So, that is the reason I am wanting to filter them at N50 and see where it goes.
Kindly suggest any other alternatives as well if possible.
Have you tried to check if you can reconcile the two assemblies to see if you can make a better combined one?
There is "newer" pipeline for combining the two assemblies called NucMerge that you might try (https://www.biorxiv.org/content/early/2018/11/30/483701), but I would first consider C: filter genome above N50 value
There are many:
Thanks for your suggestions!