Hello,
I've not worked with human reads before so the questions are about the reference genome: -- https://www.gencodegenes.org/human/ -Genome sequence, primary assembly (GRCh38) chromosomes and scaffolds - 194 contigs -- from Heng Li https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz 195 contigs , there is a decoy contig with EPV circular chromosome.
- When should I use reference with 639 contigs? with patches and haplotypes
I have the same single-end reads. I've mapped them to the reference genome from gencode(194 contigs) and to the reference genome Heng Li(195 contigs) with the same default bwa. And i have a bit different results of coverage for some chromosomes.
Number of mapped reads chr Gencode reference Heng Li reference 1 4770 4771 13 2183 2189 X 1651 1684
Is it normal or should I find a mistake? I supposed that the sequences of these assemblies were the same except decoy genome. There were not reads mapped to the decoy genome. The sum difference of mapped reads is 40 (59647 and 59687).
A lot of thanks, Valery
These differences look pretty minor and should not affect your end results. What is it that you are trying to do finally?
Thank you for reply! This a 'trial' experiment with small amount of reads. But I suppose that final step in the 'real' experiment will be SNP calling.