Question: could I use female methylation data annotated using hg38?
2.1 years ago
I downloaded TCGA breast cancer methylation data from 91 female individuals but I found something interesting. The data of female annotated by 'hg38' have Y chromosome gene symbol.

So, I searched how to handle it and get solution that tells to use reference 'hg38 canonical female'. the difference between hg38 and hg38 canonical female is as below:

(1) The hg38 contains all chromosomes as well as all unplaced contigs.

(2) The hg38 canonical female contains everything from the canonical set with the exception of chromosome Y.'

then, is it the same as removing the Y chromosome from the data annotated with hg38?

An idea: check the genomic locations of these probes. They may lie in the pseudo-autosomal regions (PARs), where chrX is homologous to chrY. Unless you are specifically interested in the sex chromosome probes, you could just remove these from your analysis from the start, stating this in your methods, of course.

Thank you for your comment. I checked the position of chrY is included in PAR region unfortunately, they were not included.

2.1 years ago
There is no easy answer, in general is fine just to remove the reads from chrY, but there are some considerations that depends on the aligner used:

  • Check how many reads are aligned to chrY, if there are only a few ones (<1%?), it's fine to remove them.
  • If the aligned read is mapped to chrY as a primary hit, you need to check if the same read is reported to be aligned in a secondary hit or not (many aligners only reports the primary, ignoring the secondary unless some parameters are set).
  • In case you have only the primary hit, you can try to align the read again to the genome to check where it comes from.
  • In the other case, you maybe want to readjust the flag for the alignment changing it from secondary to primary.
Thanks for your help. Then, I will check their read counts!

