Question: How to obtain sequences of long insertion in VCF of 1000 genome project
0
gravatar for ks
6 months ago by
ks0
ks0 wrote:

I would like to obtain the sequences of long insertion of mobile element in 1000 genome project. In the VCF file, long insertion of mobile elements are described in the following format, and no exact sequence is given.

1 170626220 ALU_umary_ALU_584 T <INS:ME:ALU> . . TSD=TAAATTTCAGTTTT;SVTYPE=ALU;MEINFO=AluYa5,1,281,-;SVLEN=280;CS=ALU_umary;AC=12;AF=0.00239617;NS=2504;AN=5008;EAS_AF=0;EUR_AF=0;AFR_AF=0.0091;AMR_AF=0;SAS_AF=0;SITEPOST=1 GT 0|0

For example of the above case, it seems ALU_umary_ALU is relevant to some database ID, but I could not find useful information on the web. So, I would be very grateful if someone could let me know how to obtain such long insertion mobile element sequence.

genome • 252 views
ADD COMMENTlink modified 6 months ago by genomax85k • written 6 months ago by ks0
1
gravatar for Ben_Ensembl
6 months ago by
Ben_Ensembl1.5k
EMBL-EBI
Ben_Ensembl1.5k wrote:

Hello,

I have answered your question via the IGSR Helpdesk, so please respond there if you have any further questions. I have added this response to help others looking at this question.

The IDs refer to the different structural variant classes and source call sets, which can also be identified using the SVTYPE and CS tags in the INFO column.

Below is a list of possible SVTYPEs:

ALU: Alu element insertion LINE1: Line1 transposable element insertion SVA: SVA element insertion, SVA stands for SINE-VNTR-Alu, it is a composite retrotransposon insertion INS: Nuclear mitochondrial insertion DEL: bi-allelic deletion DUP: bi-allelic duplication INV: bi-allelic inversion CNV: multi-allelic copy-number variant

The DEL class has been further re-classified into DEL_ALU, DEL_LINE1 and DEL_SVA if the identified deletion appeared to correspond to a reference mobile element insertion.

The source call set can be identified using the CS tag in the INFO column. Below is a list of possible CSs:

ALU_umary: Alu element insertion call set from the University of Maryland (MELT algorithm) L1_umary: Line1 transposable element insertion from the University of Maryland (MELT algorithm) SVA_umary: SVA element insertion from the University of Maryland (MELT algorithm) NUMT_umich: Nuclear mitochondrial insertion from the University of Michigan (NumtS algorithm) DEL_union: Union deletions genotypted by GenomeSTRiP and variant sites identified by GenomeSTRiP, Breakdancer, CNVnator, Delly and Variation Hunter. DEL_pindel: Small deletions (<1kbp) from Washington University (Pindel algorithm) INV_delly: Bi-allelic simple inversions from EMBL (Delly algorithm) CINV_delly: Bi-allelic complex inversions from EMBL (Delly algorithm) DUP_gs: Bi-allelic duplications and copy-number variants from Broad Institute (GenomeSTRiP algorithm) DUP_delly: Bi-allelic tandem duplications from EMBL (Delly algorithm) DUP_uwash: Bi-allelic deletions, duplications and copy-number variants from University of Washington (SSF algorithm)

Further information can be found in the README and the supplementary materials from the phase 3 publication: [1] http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/README_phase3_sv_callset_20150224 [2] https://static-content.springer.com/esm/art%3A10.1038%2Fnature15394/MediaObjects/41586_2015_BFnature15394_MOESM91_ESM.pdf

I hope this helps but please do get back in touch if you have any further questions.

Best wishes

Ben IGSR Helpdesk

ADD COMMENTlink written 6 months ago by Ben_Ensembl1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour