Novel Sequence Insertions 1000 Genomes Project
10.8 years ago
Ahdf-Lell-Kocks ★ 1.6k

I am looking for novel sequence insertions identified in the 1000 genomes project, and I found 3 files in this directory:

ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_novel_sequence_insertion_pilot2.gvf
ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_mobile_element_insertion_pilot1.gvf
ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd59_Durbin_et_al_2010/gvf/estd59_Durbin_2010_highquality_mobile_element_insertion_pilot2.gvf


It seems for non-mobile element insertions, there is only about 400 novel sequence insertions. Is there any other place where I can find more?

EDIT: for mobile elements, Casey Bergman's answer seems to be the best out there. Still, out of 7830 entries in the table, only 3089 sequences are given for the predictions in this table, the rest being blank.

I believe the SV people has a consensus about how to define "novel". Every paper I read on "novel" sequences/insertions define "novel" essentially the same way.

Mobile element insertions are not novel.

Neither are segmental duplications or CNVs for that matter. Virtually all new sequence come from pre-exisiting sequences in the genome. I think "novel" here is shorthand "not in the reference genome".

10.8 years ago

Look in Table S1 of Stewart et al (2011) A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans: http://www.plosgenetics.org/article/info%3Adoi/10.1371/journal.pgen.1002236

If so, then I would amend your question to clearly state that you are looking for non-mobile element sequences.

@Casey Bergman: as far as I can see, Table S1 in the Plos Genetics paper (http://www.plosgenetics.org/article/info%3Adoi/10.1371/journal.pgen.1002236) only contains ALU, L1 and SVA elements. I was hoping to have a place that listed the sequences for non-mobile element sequences.

8.4 years ago

In the 1000G pilot paper (A map of human genome variation from population-scale sequencing. The 1000 Genomes Consortium. Nature 467,1061-73 (2010)), we assembled 164 humans with Cortex and assembled novel sequence. The file is here:

ftp.1000genomes.ebi.ac.uk:/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/low_coverage/sv/low_coverage.2010_10.novel_sequence


and the method is explained in the Supp Info.