Question: 1000Genome SV integrated map's power for SV filter
gravatar for michealsmith
4.5 years ago by
michealsmith750 wrote:

I'm trying to use 1000GenomeProject integrated map to filter out common SVs, in order to look for rare/novel SV in disease sample.


  1. I guess the integrated map is NOT quite integrated, correct? Because this list contains those highly confident and validated SVs, while ambiguous ones will be kicked out. But usually SV callings from real-world data would contain many such "ambiguous" ones (results in lots of false-positive caused by repetitive sequence misalignment, etc., or systematic errors/bias from the program itself). So if we use this "integrated" map as "golden standard" for filter, we'll end up retaining many "ambiguous" false positive.

    For those analyzing tumor samples, naturally you'll have controls. But I'm working on complex disease, so one solution I could think of is to run many CONTROL samples (for example, CEU controls from 1000Genome) simultaneously, and remove whatever seen in CONTROL, which hopefully removes many "ambiguous" ones.

    What else solutions could I do?

  2. I randomly pick up several SV callings, which shows up as common deletions in my CONTROL, but interestingly absent from integrated SV map; To my surprise, they are all not-conserved LINE, picture as below:

enter image description here

The deletion absent from 1000Genome Project integrated map is the gap in the middle, I'm wondering why?


ADD COMMENTlink modified 22 months ago by LGMgeo90 • written 4.5 years ago by michealsmith750

Just a comment on "run[ning] many CONTROL samples": my company works on cancer but we lack normal tissue. Therefore, we use exactly your approach by removing variants which frequently occur in genome resequencing projects.

ADD REPLYlink written 4.5 years ago by Manuel Landesfeind1.2k

The 1kg provides some of the calls that didn't make the final cut in the working directories. I'd recommend downloading those.

ADD REPLYlink written 4.4 years ago by Zev.Kronenberg11k
4.4 years ago by

The 1000 Genomes phase 3 integrated SV call set was generated from 2,504 low coverage (7-9X) samples. Deletions have much higher confidence than duplications (the smallest DUP is 3kb)

If you have high coverage data you will have greater sensitivity for SV detection.

In addition, just because a SV does not overlap with 1kg does not mean it is a very rare nonpathogenic variant.

The deletion absent from 1000Genome Project integrated map is the gap in the middle, I'm wondering why?

If the variant is in the reference genome browser track then it's essentially fixed the population. 1000 Genomes reports common variants that may or may not be represented in reference builds (this is one of the goals of the project is to generate better reference builds that incorporate genetic diversity)

If you want to prioritize SVs I suggest using ANNOVAR for annotation.

If you want to apply a more systematic approach to prioritization SVtyper is a good program.

Alternatively you can try out my script for CNV gtCNV which will annotate your variants that overlap to 1000 Genomes, LINEs, STRs, MEIs, genes, and segmental duplications (low copy repeats).

ADD COMMENTlink written 4.4 years ago by QVINTVS_FABIVS_MAXIMVS2.4k
gravatar for LGMgeo
2.2 years ago by
European Union
LGMgeo90 wrote:

I suggest using AnnotSV for annotation (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD COMMENTlink written 2.2 years ago by LGMgeo90

The link to AnnotSV seems to be broken. If you are an author on the paper, do you mind fixing the link please?

ADD REPLYlink written 22 months ago by QVINTVS_FABIVS_MAXIMVS2.4k

For 2 days, we experienced some technical problems with our network, so that you can not access the AnnotSV website.

I apologize for any inconvenience that may be caused by this temporary interruption.

ADD REPLYlink written 22 months ago by LGMgeo90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1938 users visited in the last hour