Only females but still few reads map to Y chromosome
1
0
Entering edit mode
3.1 years ago
serpalma.v ▴ 70

Hello!

I have been provided a data set (WGS) consisting of only female mice. After finishing the alignments with bwa I ran picard's BAMIndexStats.

Surprisingly, the Y chromosome had reads mapped to it Fig1, but they were very few compared to the rest of the reference genome Fig2.

I would kindly request your opinion and feedback in order to know if this is an issue I should fix, and if so, possible approaches.

Thanks!

bwa WGS alignment • 1.7k views
1
Entering edit mode

Hello,

please have a look at How to add images to a Biostars post to include your pictures correct.

Could it be, that reads mapped to chromosome Y are in the PAR region, because the reference genome you haved used, hasn't masked this region? (A: Which human reference genome should I use?)

fin swimmer

0
Entering edit mode
1
Entering edit mode

The well know PAR sequences finswimmer You forgot to share the wiki about it ;)

Create a sub bam of your reads mapped on chr Y then, try to visualize it under IGV for example. Look if you got some distinct areas where the reads mapped. Look in the litterature if these area are duplicated somewhere else in the mouse genome.

0
Entering edit mode

Could you please elaborate on what you mean with "distinct areas"?

1
Entering edit mode

Typicaly, the PAR regions on chr Y. Another example, the AMELX gene on the chr X, you can find a copy of the gene on chr Y (AMELY) (https://ghr.nlm.nih.gov/gene/AMELX).

If you are sure you got only female mice, you can delete the chr Y from the reference

5
Entering edit mode
3.1 years ago

not a suprise as said Bastien Hervé and finswimmer there are some common regions between X and Y.

If you run samtools idxstats for 1000genomes/NA12878 ( female), you'll see that most reads are mapped on X but a few map on Y

$find 1000G -name "*.bam" | while read F; do echo$F && samtools idxstats \$F | grep -E '^(X|Y)' ;done

1000G/ftp/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
X   155270560   7340285 0
Y   59373566    27261   0
1000G/ftp/phase1/technical/other_exome_alignments/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.BWA.CEU.exome.20110521.bam
X   155270560   4944705 0
Y   59373566    70046   0
1000G/ftp/phase3/data/NA12878/alignment/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
X   155270560   8783715 53818
Y   59373566    45922   5602
1000G/ftp/phase3/data/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam
X   155270560   10010067    126237
Y   59373566    50642   6249
1000G/ftp/phase3/data/NA12878/high_coverage_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.high_coverage_pcr_free.20130906.bam
X   155270560   37264083    217222
Y   59373566    222821  1529
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.bam
X   155270560   10013253    78707
Y   59373566    27583   2190
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.bam
X   155270560   136857405   1852183
Y   59373566    703280  104728
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X   155270560   10013253    78707
Y   59373566    27583   2190
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X   155270560   136857405   1852184
Y   59373566    703280  104728
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.HiSeq2000.20121016.bam
X   155270560   39360346    689280
Y   59373566    93256   7575
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.MiSeq.20121016.bam
X   155270560   2209546 35513
Y   59373566    4800    369
1000G/ftp/technical/working/20121023_sga_dindel_evidence_bams/NA12878.chr20.ILLUMINA.sga_dindel_subset.CEU.evidence.20111114.bam
X   155270560   2241    0
Y   59373566    29  0
1000G/ftp/technical/working/20121126_NA12878_bam_downSampledTo5x/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.downsampledTo5x.bam
X   155270560   8823168 104511
Y   59373566    46372   6866
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-mem.20131224.bam
X   155270560   1487159 0
Y   59373566    46425   0
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-sw.20140202.bam
X   155270560   1338206 0
Y   59373566    40394   0

0
Entering edit mode

Ok this is a relief... From what I gather so far, this is an issue that requires further examination rather than elimination. Would you agree?

0
Entering edit mode

Yes, check that the reads have indeed aligned to the PAR