Is there a standard file format for anchored contigs?
2
0
Entering edit mode
2.8 years ago
brogroh • 0

Hi,

Given a genome with many unordered contigs, and some external information that can be used to anchor these to chromosomes/linkage groups, is there a standard file format for specifying the linkage relationships between contigs? Downstream analyses will rely on this order, for example, window-based calculations of popgen summary statistics. For example, I can map the set of linkage markers to the reference using a short-read aligner, and determine that a certain set of contigs belong to linkage group X, and are in a particular order. Should this simply be represented in a fasta file with linkage information encoded in the header?

Thanks!

0
Entering edit mode

One commonly used method is to link contigs into scaffolds by an arbitrary number of Ns - for example, 10 Ns: ACGTNNNNNNNNNNACGT.

2
Entering edit mode
2.8 years ago
Malcolm.Cook ★ 1.3k

ALLMAPS: robust scaffold ordering based on multiple maps (github: ALLMAPS) project documentation includes ALLMAPS: How to use different types of genomic maps which does a good job of outlining the various formats that are be found in the wild and provides tools for inter-conversion and many other useful functions.

1
Entering edit mode
2.8 years ago
cmdcolin ★ 1.7k

Also see things like agp2fasta Tool To Create Scaffold Fasta File From An Agp And Contig Fasta File?