Question: Is That Necessary To Remove The Chr M And Y Of Reference File And Bam File For The Downstream Work?
6.9 years ago
I am using GATK software for Basecalibrating work by three files : indexed reference file, BAM file (after alignment) and SNP variant file with VCF format.

I found that the chr M and Y inforamtion are stored in the reference file and BAM file but not in VCF file, Is that necessary to remove the chr M and Y of reference file and BAM file to make them match in the VCF file? I might read some threads about this topic that it is not necessary to do this step but I am not quite sure if I am right.

Thank you!

6.9 years ago
The chromosome M and Y are in your BAM file because you used a reference file for alignment that have chromosome M and Y in it. Additionally, your reference fasta file should also have chr_Un and chr_random contigs and that should also be present in your BAM file. Though many people don't include them for alignment purpose but I would advise them to do so to reduce false positive variants or alignments.

Most of the people remove chr M and chr_Un and chr_random contigs after the alignment. Fate of 'chr Y' depends on if you are interested in it or not. Most labs sequence female mice so as to get good coverage on X chromosome and no Y chromosome will be present. so 'chr Y' is discarded.

It seems that study that generated the VCF file you are using only sequenced female strains and were not intesrested in chr M.

Depending on what you guys sequenced in your lab and what are you interested in , you should keep or throw chr M and chr Y.

