How To Distinguish Genuine Duplication From Sequencing-Annotation Artifact?
3
1
Entering edit mode
10.3 years ago
troy7011 ▴ 10

I am working on the genomics of a model organism that has multiple copies for many important genes. For example, there are multiple copies of a BMP ligand annotated as the same gene. How can I determine if the copies of a gene are genuine duplications or artifacts of sequencing-annotations, for example due to not collapsing the variants during the sequence assembly? Any help will be greatly appreciated.

annotation • 3.1k views
ADD COMMENT
1
Entering edit mode
10.3 years ago

To check if they're not just artifacts of sequence assembly, you can just do a Southern blot. To see if they're actually transcribed, try qPCR/RNAseq/Northern blot (assuming there's any sequence divergence). If there's no sequence difference, then it's a lot more work.

ADD COMMENT
1
Entering edit mode
10.3 years ago

In all cases, dpryan79 is right to emphasize validation: this is always necessary and important.

From a bioinformatics standpoint, you can tell your aligner to not allow ambiguous alignments. If there are identical regions in multiple genes, these will be ignored and counted as unaligned reads. Truth be told, I don't typically worry too much about this, especially if you have 100+ bp reads.

For variant calling, you'll want to remove true duplicates from the alignment to avoid over-emphasizing sequencing errors amplified via PCR. However, I don't remove duplicates from RNA-Seq data because 1) highly expressed genes will have genuine duplicate reads due to very high coverage (say, >100x for 100 bp reads) and 2) I typically only care about expression levels and possibly splicing events anyways (not variant calling).

ADD COMMENT
0
Entering edit mode
10.3 years ago
5heikki 11k

genome-walking PCR

ADD COMMENT

Login before adding your answer.

Traffic: 2373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6