Determining which chromosome is the sex chromosome from FASTA file
1
0
Entering edit mode
2.0 years ago
Ivan ▴ 60

I need to determine the chromosome type (sex, autosome, or mitochondrial) for my workflow. I usually determine it from the assembly report, for example, the mouse assembly report contains the following table:

1   assembled-molecule  1   Chromosome  CM000994.3  =   NC_000067.7 C57BL/6J    195154279   chr1
2   assembled-molecule  2   Chromosome  CM000995.3  =   NC_000068.8 C57BL/6J    181755017   chr2
3   assembled-molecule  3   Chromosome  CM000996.3  =   NC_000069.7 C57BL/6J    159745316   chr3
4   assembled-molecule  4   Chromosome  CM000997.3  =   NC_000070.7 C57BL/6J    156860686   chr4
5   assembled-molecule  5   Chromosome  CM000998.3  =   NC_000071.7 C57BL/6J    151758149   chr5
6   assembled-molecule  6   Chromosome  CM000999.3  =   NC_000072.7 C57BL/6J    149588044   chr6
7   assembled-molecule  7   Chromosome  CM001000.3  =   NC_000073.7 C57BL/6J    144995196   chr7
8   assembled-molecule  8   Chromosome  CM001001.3  =   NC_000074.7 C57BL/6J    130127694   chr8
9   assembled-molecule  9   Chromosome  CM001002.3  =   NC_000075.7 C57BL/6J    124359700   chr9
10  assembled-molecule  10  Chromosome  CM001003.3  =   NC_000076.7 C57BL/6J    130530862   chr10
11  assembled-molecule  11  Chromosome  CM001004.3  =   NC_000077.7 C57BL/6J    121973369   chr11
12  assembled-molecule  12  Chromosome  CM001005.3  =   NC_000078.7 C57BL/6J    120092757   chr12
13  assembled-molecule  13  Chromosome  CM001006.3  =   NC_000079.7 C57BL/6J    120883175   chr13
14  assembled-molecule  14  Chromosome  CM001007.3  =   NC_000080.7 C57BL/6J    125139656   chr14
15  assembled-molecule  15  Chromosome  CM001008.3  =   NC_000081.7 C57BL/6J    104073951   chr15
16  assembled-molecule  16  Chromosome  CM001009.3  =   NC_000082.7 C57BL/6J    98008968    chr16
17  assembled-molecule  17  Chromosome  CM001010.3  =   NC_000083.7 C57BL/6J    95294699    chr17
18  assembled-molecule  18  Chromosome  CM001011.3  =   NC_000084.7 C57BL/6J    90720763    chr18
19  assembled-molecule  19  Chromosome  CM001012.3  =   NC_000085.7 C57BL/6J    61420004    chr19
X   assembled-molecule  X   Chromosome  CM001013.3  =   NC_000086.8 C57BL/6J    169476592   chrX
Y   assembled-molecule  Y   Chromosome  CM001014.3  =   NC_000087.8 C57BL/6J    91455967    chrY

And you can easily infer which chromosomes are sex chromosomes. But for a different organism, its assembly report has the following table.

    # Sequence-Name Sequence-Role   Assigned-Molecule   Assigned-Molecule-Location/Type GenBank-Accn    Relationship    RefSeq-Accn Assembly-Unit   Sequence-Length UCSC-style-name
CHR1_Scaffold   assembled-molecule  1   Chromosome  CM008300.1  =   NC_035897.1 Primary Assembly    26953843    na
CHR2_Scaffold   assembled-molecule  2   Chromosome  CM008301.1  =   NC_035898.1 Primary Assembly    34292151    na
CHR3_Scaffold   assembled-molecule  3   Chromosome  CM008302.1  =   NC_035899.1 Primary Assembly    74127438    na
CHR4_Scaffold   assembled-molecule  4   Chromosome  CM008303.1  =   NC_035900.1 Primary Assembly    25652880    na
CHR5_Scaffold   assembled-molecule  5   Chromosome  CM008304.1  =   NC_035901.1 Primary Assembly    40584741    na
CHR6_Scaffold   assembled-molecule  6   Chromosome  CM008305.1  =   NC_035902.1 Primary Assembly    52532229    na
CHR7_Scaffold   assembled-molecule  7   Chromosome  CM008306.1  =   NC_035903.1 Primary Assembly    40813162    na
CHR8_Scaffold   assembled-molecule  8   Chromosome  CM008307.1  =   NC_035904.1 Primary Assembly    27092187    na
CHR9_Scaffold   assembled-molecule  9   Chromosome  CM008308.1  =   NC_035905.1 Primary Assembly    42966623    na
CHR10_Scaffold  assembled-molecule  10  Chromosome  CM008309.1  =   NC_035906.1 Primary Assembly    35377769    na
CHR11_Scaffold  assembled-molecule  11  Chromosome  CM008310.1  =   NC_035907.1 Primary Assembly    11650534    na
CHR12_Scaffold  assembled-molecule  12  Chromosome  CM008311.1  =   NC_035908.1 Primary Assembly    31403898    na
CHR13_Scaffold  assembled-molecule  13  Chromosome  CM008312.1  =   NC_035909.1 Primary Assembly    44161018    na
CHR14_Scaffold  assembled-molecule  14  Chromosome  CM008313.1  =   NC_035910.1 Primary Assembly    38842236    na
CHR15_Scaffold  assembled-molecule  15  Chromosome  CM008314.1  =   NC_035911.1 Primary Assembly    35718434    na
CHR16_Scaffold  assembled-molecule  16  Chromosome  CM008315.1  =   NC_035912.1 Primary Assembly    41032606    na
CHR17_Scaffold  assembled-molecule  17  Chromosome  CM008316.1  =   NC_035913.1 Primary Assembly    28317440    na
CHR18_Scaffold  assembled-molecule  18  Chromosome  CM008317.1  =   NC_035914.1 Primary Assembly    41377133    na
CHR19_Scaffold  assembled-molecule  19  Chromosome  CM008318.1  =   NC_035915.1 Primary Assembly    35429340    na
CHR20_Scaffold  assembled-molecule  20  Chromosome  CM008319.1  =   NC_035916.1 Primary Assembly    36262418    na
CHR21_Scaffold  assembled-molecule  21  Chromosome  CM008320.1  =   NC_035917.1 Primary Assembly    21661149    na
CHR22_Scaffold  assembled-molecule  22  Chromosome  CM008321.1  =   NC_035918.1 Primary Assembly    32506130    na
CHR23_Scaffold  assembled-molecule  23  Chromosome  CM008322.1  =   NC_035919.1 Primary Assembly    38085671    na
CHR24_Scaffold  assembled-molecule  24  Chromosome  CM008323.1  =   NC_035920.1 Primary Assembly    47266144    na
CHR25_Scaffold  assembled-molecule  25  Chromosome  CM008324.1  =   NC_035921.1 Primary Assembly    46505145    na

And I can't determine it from the assembly report. What I do have is the FASTA file of the reference genome. Can I use the FASTA file to determine which chromosomes are sex chromosomes?

sex chromosomes genome reference fasta • 690 views
ADD COMMENT
0
Entering edit mode

Assuming that sex chromosome sequences are conserved between/among the closely related species, whichever contig maps better (high alignment) with sex chromosome of close relative species, that contig would be sex chromosome. Go for synteny with closely related species genomes.

ADD REPLY
1
Entering edit mode
2.0 years ago

I guess, that highly depends on the quality of the assembly and prior knowledge about the organism.

With prior knowledge, I mean that your organism may have a different number of sex chromosomes to start from: The marbled crayfish for example has three, the platypus ten. Since linear assemblies of sex chromosomes are notoriously hard, I would also suspect that several of the unplaced scaffolds will originate from the sex chromosomes.

If available, I would try to find a well-studied related organism for which a curated and annotated reference genome is available and use e.g. Satsuma2 to align your assembly against the reference. That should give you at least an indication how the puzzle fits together.

ADD COMMENT

Login before adding your answer.

Traffic: 2868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6