Question: Additional Data In Human Genome (Hg18 / Hg19) Assembly ?
8
gravatar for Khader Shameer
7.1 years ago by
Manhattan, NY
Khader Shameer17k wrote:

While indexing hg18 and hg19 (UCSC), I noticed several additional chromosome headers are present apart from the default headers(chr1-22, M, X, Y). What are they ? Do I need to consider / remove them during the alignment with my whole exome reads ? What is your opinion on considering / removing them in the alignment step ?

hg18:

chr1_random chr2_random chr3_random chr4_random chr5_random chr6_random chr7_random chr8_random chr9_random chr10_random chr11_random chr13_random chr15_random chr16_random chr17_random chr18_random chr19_random chr21_random chr22_random chrX_random

hg19:

chr6_ssto_hap7 chr6_mcf_hap5 chr6_cox_hap2 chr6_mann_hap4 chr6_apd_hap1 chr6_qbl_hap6 chr6_dbb_hap3 chr17_ctg5_hap1 chr4_ctg9_hap1 chr1_gl000192_random chrUn_gl000225 chr4_gl000194_random chr4_gl000193_random chr9_gl000200_random chrUn_gl000222 chrUn_gl000212 chr7_gl000195_random chrUn_gl000223 chrUn_gl000224 chrUn_gl000219 chr17_gl000205_random chrUn_gl000215 chrUn_gl000216 chrUn_gl000217 chr9_gl000199_random chrUn_gl000211 chrUn_gl000213 chrUn_gl000220 chrUn_gl000218 chr19_gl000209_random chrUn_gl000221 chrUn_gl000214 chrUn_gl000228 chrUn_gl000227 chr1_gl000191_random chr19_gl000208_random chr9_gl000198_random chr17_gl000204_random chrUn_gl000233 chrUn_gl000237 chrUn_gl000230 chrUn_gl000242 chrUn_gl000243 chrUn_gl000241 chrUn_gl000236 chrUn_gl000240 chr17_gl000206_random chrUn_gl000232 chrUn_gl000234 chr11_gl000202_random chrUn_gl000238 chrUn_gl000244 chrUn_gl000248 chr8_gl000196_random chrUn_gl000249 chrUn_gl000246 chr17_gl000203_random chr8_gl000197_random chrUn_gl000245 chrUn_gl000247 chr9_gl000201_random chrUn_gl000235 chrUn_gl000239 chr21_gl000210_random chrUn_gl000231 chrUn_gl000229 chrUn_gl000226 chr18_gl000207_random

genome next-gen sequencing • 13k views
ADD COMMENTlink modified 7.1 years ago by Suganthi50 • written 7.1 years ago by Khader Shameer17k
10
gravatar for Pierre Lindenbaum
7.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum107k wrote:

from the UCSC FAQ: chrN_random tables: http://genome.ucsc.edu/FAQ/FAQdownloads#download10

Question:

"What are the chrN_random_[table] files in the human assembly? Why are they called random? Is there something biologically random about the sequence in these tables or are they just not placed within their given chromosomes?"

Response:

In the past, these tables contained data related to sequence that is known to be in a particular chromosome, but could not be reliably ordered within the current sequence.

Starting with the April 2003 human assembly, these tables also include data for sequence that is not in a finished state, but whose location in the chromosome is known, in addition to the unordered sequence. Because this sequence is not quite finished, it could not be included in the main "finished" ordered and oriented section of the chromosome.

Also, in a very few cases in the April 2003 assembly, the random files contain data related to sequence for alternative haplotypes. This is present primarily in chr6, where we have included two alternative versions of the MHC region in chr6_random. There are a few clones in other chromosomes that also correspond to a different haplotype. Because the primary reference sequence can only display a single haplotype, these alternatives were included in random files. In subsequent assemblies, these regions have been moved into separate files (e.g. chr6_hla_hap1).

ADD COMMENTlink written 7.1 years ago by Pierre Lindenbaum107k
5

I would argue that one should include the *random chromosomes for alignment, as they will help to prevent misalignment owing to paralogy. This affects both exome capture and WGA.

ADD REPLYlink written 7.1 years ago by Aaronquinlan10k
2

no, because my exome capture wasn't designed for those random chromosomes. But... maybe I should have consider to include them: http://biostar.stackexchange.com/questions/7572

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum107k
2

Just to expand on what Aaron said (for others who stumble across this thread), if a read comes from one of these extra contigs, but you don't include it in your reference for alignment, you may find that the read then ends up mis-mapping at the next best match, which is often some similar sequence elsewhere in the genome. This is generally a bad thing

ADD REPLYlink written 7.1 years ago by Chris Miller19k

Thanks Pierre. Have you consider them during indexing or alignment with your exome reads ?

ADD REPLYlink written 7.1 years ago by Khader Shameer17k

Thanks Pierre, Aaron !

ADD REPLYlink written 7.1 years ago by Khader Shameer17k

Just to expand on what Aaron said, if a read comes from one of these extra contigs, but you don't include it in your reference for alignment, you may find that the read then ends up mis-mapping at the next best match, which is often some similar sequence elsewhere in the genome. This is generally a bad thing.

ADD REPLYlink written 7.1 years ago by Chris Miller19k

Chris: That's a neat summary !

ADD REPLYlink written 7.1 years ago by Khader Shameer17k
1
gravatar for Suganthi
7.1 years ago by
Suganthi50
Suganthi50 wrote:

For HG19, the chromosomes pertaining to 6 and not labeled as random, are different haplotypes for the MHC region and I believe a similar situation exists for Chr17 ( though I am not sure what the alternate loci are). Please take a look at

http://vega.sanger.ac.uk/info/data/MHC_Homo_sapiens.html

http://genomeref.blogspot.com/

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml

As to whether these regions should be included for alignment, perhaps yes, but it is bound to be complicated due to high similarity of regions.

ADD COMMENTlink written 7.1 years ago by Suganthi50

Thanks a lot Suganthi !

ADD REPLYlink written 7.1 years ago by Khader Shameer17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 890 users visited in the last hour