Question: Only females but still few reads map to Y chromosome
0
gravatar for serpalma.v
5 months ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Hello!

I have been provided a data set (WGS) consisting of only female mice. After finishing the alignments with bwa I ran picard's BAMIndexStats.

Surprisingly, the Y chromosome had reads mapped to it Fig1, but they were very few compared to the rest of the reference genome Fig2.

I would kindly request your opinion and feedback in order to know if this is an issue I should fix, and if so, possible approaches.

Thanks!

bwa alignment wgs • 325 views
ADD COMMENTlink modified 5 months ago by Pierre Lindenbaum119k • written 5 months ago by serpalma.v20
1

Hello,

please have a look at How to add images to a Biostars post to include your pictures correct.

Could it be, that reads mapped to chromosome Y are in the PAR region, because the reference genome you haved used, hasn't masked this region? (A: Which human reference genome should I use?)

fin swimmer

ADD REPLYlink written 5 months ago by finswimmer11k

I am using Mus_musculus.GRCm38.dna.primary_assembly from ensembl ftp://ftp.ensembl.org/pub/release-93/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

ADD REPLYlink modified 5 months ago • written 5 months ago by serpalma.v20
1

The well know PAR sequences finswimmer You forgot to share the wiki about it ;)

Create a sub bam of your reads mapped on chr Y then, try to visualize it under IGV for example. Look if you got some distinct areas where the reads mapped. Look in the litterature if these area are duplicated somewhere else in the mouse genome.

ADD REPLYlink modified 5 months ago • written 5 months ago by Bastien Hervé4.0k

Could you please elaborate on what you mean with "distinct areas"?

ADD REPLYlink written 5 months ago by serpalma.v20
1

Typicaly, the PAR regions on chr Y. Another example, the AMELX gene on the chr X, you can find a copy of the gene on chr Y (AMELY) (https://ghr.nlm.nih.gov/gene/AMELX).

If you are sure you got only female mice, you can delete the chr Y from the reference

ADD REPLYlink written 5 months ago by Bastien Hervé4.0k
4
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

not a suprise as said Bastien Hervé and finswimmer there are some common regions between X and Y.

If you run samtools idxstats for 1000genomes/NA12878 ( female), you'll see that most reads are mapped on X but a few map on Y

$ find 1000G  -name "*.bam" | while read F; do echo $F && samtools idxstats $F | grep -E '^(X|Y)' ;done

1000G/ftp/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
X   155270560   7340285 0
Y   59373566    27261   0
1000G/ftp/phase1/technical/other_exome_alignments/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.BWA.CEU.exome.20110521.bam
X   155270560   4944705 0
Y   59373566    70046   0
1000G/ftp/phase3/data/NA12878/alignment/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
X   155270560   8783715 53818
Y   59373566    45922   5602
1000G/ftp/phase3/data/NA12878/exome_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.exome.20121211.bam
X   155270560   10010067    126237
Y   59373566    50642   6249
1000G/ftp/phase3/data/NA12878/high_coverage_alignment/NA12878.mapped.ILLUMINA.bwa.CEU.high_coverage_pcr_free.20130906.bam
X   155270560   37264083    217222
Y   59373566    222821  1529
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.bam
X   155270560   10013253    78707
Y   59373566    27583   2190
1000G/ftp/technical/working/20110915_CEUtrio_b37_decoy_alignment/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.bam
X   155270560   136857405   1852183
Y   59373566    703280  104728
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WEx.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X   155270560   10013253    78707
Y   59373566    27583   2190
1000G/ftp/technical/working/20120117_ceu_trio_b37_decoy/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam
X   155270560   136857405   1852184
Y   59373566    703280  104728
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.HiSeq2000.20121016.bam
X   155270560   39360346    689280
Y   59373566    93256   7575
1000G/ftp/technical/working/20121016_exome_indel_seq_validation/NA12878.exome_indel_validation.MiSeq.20121016.bam
X   155270560   2209546 35513
Y   59373566    4800    369
1000G/ftp/technical/working/20121023_sga_dindel_evidence_bams/NA12878.chr20.ILLUMINA.sga_dindel_subset.CEU.evidence.20111114.bam
X   155270560   2241    0
Y   59373566    29  0
1000G/ftp/technical/working/20121126_NA12878_bam_downSampledTo5x/CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.downsampledTo5x.bam
X   155270560   8823168 104511
Y   59373566    46372   6866
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-mem.20131224.bam
X   155270560   1487159 0
Y   59373566    46425   0
1000G/ftp/technical/working/20131209_na12878_pacbio/si/NA12878.pacbio.bwa-sw.20140202.bam
X   155270560   1338206 0
Y   59373566    40394   0
ADD COMMENTlink modified 5 months ago • written 5 months ago by Pierre Lindenbaum119k

Ok this is a relief... From what I gather so far, this is an issue that requires further examination rather than elimination. Would you agree?

ADD REPLYlink written 5 months ago by serpalma.v20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1080 users visited in the last hour