How is the reference genome top level constructed?
1
1
Entering edit mode
6.0 years ago

Dear all,

I was wondering how is built the human reference genome top-level fasta file. I thought it was a single fasta file but I realized it is actually a multifasta, but it does not only contain the sequences from all the chromosomes (which are instead single fasta), but also contains several patches and scaffold. What is the function of these 'extra' files? why are not included directly in the chromosomes files?

Thank you

Assembly genome • 1.2k views
ADD COMMENT
1
Entering edit mode

Typically not all sequences can be assigned to a chromosome. These extra sequences are put into additional files. I guess these are the ones you're referring to here.

ADD REPLY
1
Entering edit mode

There are also alternate contigs in which the the chromosome location is known, but there is sufficient heterogeneity within the population at that location that alternate sequences were deemed necessary.

ADD REPLY
3
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6