Defining Your Reference Genome From Ucsc For Human Ngs Studies
3
9
Entering edit mode
12.9 years ago
Travis ★ 2.8k

Hi,

When creating a reference genome for human NGS studies, do people generally just use the major chromosomal contigs (chr1-22, chrX,Y,M) or do they also include the unplaced (chrUn) and random contigs (chrrandom*)?

I had initially assumed I should just go with the main contigs but now have begun to question my original reasoning.

next-gen sequencing genome reference • 5.3k views
ADD COMMENT
0
Entering edit mode

Is there a particular reason you're questioning your original reasoning? Usually, I use chromosomal contigs for alignment, but now your question is leaving me wondering if I'm missing something...

ADD REPLY
0
Entering edit mode

I question my original decision because it was based on a whim and I noted that there are known polymorphisms associated with the unplaced and random contigs. Since these are sequences that do not map to any of the reference chromosomes, I believe it is probably best to include them in a reference genome whilst excluding the haplotype files.

ADD REPLY
10
Entering edit mode
12.9 years ago

The "random" contigs contain DNA that we know is in the genome, but that we're having trouble accurately placing into context. For alignment, at least, it's important to use these contigs. Here's why:

If you have reads that originated from a 'random' contig, but the 'random' contigs aren't in your reference sequence, it's quite likely that the read will be mapped elsewhere in the genome, albeit at a lower quality. Some of these reads are going to pass your quality filters incorrectly and if enough of them do, it can affect your SNP calling, copy-number assessment, etc.

So yeah, alignment should pretty much always be done against all the sequences.

ADD COMMENT
8
Entering edit mode
12.9 years ago
Pablo ★ 1.9k

I think this might be helpful (the reference genome from 1000 Genomes project)

http://www.1000genomes.org/announcements/release-1000-genomes-main-project-reference-genome-2009-10-12

They say "Create a reference with chrs1-22, X, Y, NC_012920 MT, and include the non-chromosomal supercontigs."

But remember that if you are using BWA for mapping reads, your reference cannot be longer than 4Gb (otherwise BWA will silently fail).

ADD COMMENT
5
Entering edit mode
12.9 years ago
lh3 33k

Here is a longer explanation I have just written. In short, include _random but exclude _alt.

ADD COMMENT

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6