Question: How is the reference genome top level constructed?
11 months ago by
Germany, Mannheim, UMM
marongiu.luigi380 wrote:

Dear all,

I was wondering how is built the human reference genome top-level fasta file. I thought it was a single fasta file but I realized it is actually a multifasta, but it does not only contain the sequences from all the chromosomes (which are instead single fasta), but also contains several patches and scaffold. What is the function of these 'extra' files? why are not included directly in the chromosomes files?

Thank you

Typically not all sequences can be assigned to a chromosome. These extra sequences are put into additional files. I guess these are the ones you're referring to here.

There are also alternate contigs in which the the chromosome location is known, but there is sufficient heterogeneity within the population at that location that alternate sequences were deemed necessary.

11 months ago by
Emily_Ensembl17k wrote:

Dan will explain it for you.

