Question: How is the reference genome top level constructed?
1
gravatar for marongiu.luigi
11 months ago by
Germany, Mannheim, UMM
marongiu.luigi380 wrote:

Dear all,

I was wondering how is built the human reference genome top-level fasta file. I thought it was a single fasta file but I realized it is actually a multifasta, but it does not only contain the sequences from all the chromosomes (which are instead single fasta), but also contains several patches and scaffold. What is the function of these 'extra' files? why are not included directly in the chromosomes files?

Thank you

assembly genome • 360 views
ADD COMMENTlink modified 11 months ago by Emily_Ensembl17k • written 11 months ago by marongiu.luigi380
1

Typically not all sequences can be assigned to a chromosome. These extra sequences are put into additional files. I guess these are the ones you're referring to here.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche18k
1

There are also alternate contigs in which the the chromosome location is known, but there is sufficient heterogeneity within the population at that location that alternate sequences were deemed necessary.

ADD REPLYlink written 11 months ago by d-cameron2.0k
3
gravatar for Emily_Ensembl
11 months ago by
Emily_Ensembl17k
EMBL-EBI
Emily_Ensembl17k wrote:

Dan will explain it for you.

ADD COMMENTlink written 11 months ago by Emily_Ensembl17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2225 users visited in the last hour