How to obtain sequences of long insertion in VCF of 1000 genome project
1
0
Entering edit mode
4.4 years ago
ks • 0

I would like to obtain the sequences of long insertion of mobile element in 1000 genome project. In the VCF file, long insertion of mobile elements are described in the following format, and no exact sequence is given.

1 170626220 ALU_umary_ALU_584 T <INS:ME:ALU> . . TSD=TAAATTTCAGTTTT;SVTYPE=ALU;MEINFO=AluYa5,1,281,-;SVLEN=280;CS=ALU_umary;AC=12;AF=0.00239617;NS=2504;AN=5008;EAS_AF=0;EUR_AF=0;AFR_AF=0.0091;AMR_AF=0;SAS_AF=0;SITEPOST=1 GT 0|0

For example of the above case, it seems ALU_umary_ALU is relevant to some database ID, but I could not find useful information on the web. So, I would be very grateful if someone could let me know how to obtain such long insertion mobile element sequence.

genome • 1.3k views
ADD COMMENT
0
Entering edit mode

Hello, I tried to read all the documentation, but still, I'm not able to extract the ALU sequence. I only found their choordinates, but I cannot understand which is the sequence of the ALU

ADD REPLY
1
Entering edit mode
4.3 years ago
Ben_Ensembl ★ 2.4k

Hello,

I have answered your question via the IGSR Helpdesk, so please respond there if you have any further questions. I have added this response to help others looking at this question.

The IDs refer to the different structural variant classes and source call sets, which can also be identified using the SVTYPE and CS tags in the INFO column.

Below is a list of possible SVTYPEs:

ALU: Alu element insertion LINE1: Line1 transposable element insertion SVA: SVA element insertion, SVA stands for SINE-VNTR-Alu, it is a composite retrotransposon insertion INS: Nuclear mitochondrial insertion DEL: bi-allelic deletion DUP: bi-allelic duplication INV: bi-allelic inversion CNV: multi-allelic copy-number variant

The DEL class has been further re-classified into DEL_ALU, DEL_LINE1 and DEL_SVA if the identified deletion appeared to correspond to a reference mobile element insertion.

The source call set can be identified using the CS tag in the INFO column. Below is a list of possible CSs:

ALU_umary: Alu element insertion call set from the University of Maryland (MELT algorithm) L1_umary: Line1 transposable element insertion from the University of Maryland (MELT algorithm) SVA_umary: SVA element insertion from the University of Maryland (MELT algorithm) NUMT_umich: Nuclear mitochondrial insertion from the University of Michigan (NumtS algorithm) DEL_union: Union deletions genotypted by GenomeSTRiP and variant sites identified by GenomeSTRiP, Breakdancer, CNVnator, Delly and Variation Hunter. DEL_pindel: Small deletions (<1kbp) from Washington University (Pindel algorithm) INV_delly: Bi-allelic simple inversions from EMBL (Delly algorithm) CINV_delly: Bi-allelic complex inversions from EMBL (Delly algorithm) DUP_gs: Bi-allelic duplications and copy-number variants from Broad Institute (GenomeSTRiP algorithm) DUP_delly: Bi-allelic tandem duplications from EMBL (Delly algorithm) DUP_uwash: Bi-allelic deletions, duplications and copy-number variants from University of Washington (SSF algorithm)

Further information can be found in the README and the supplementary materials from the phase 3 publication: [1] http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/README_phase3_sv_callset_20150224 [2] https://static-content.springer.com/esm/art%3A10.1038%2Fnature15394/MediaObjects/41586_2015_BFnature15394_MOESM91_ESM.pdf

I hope this helps but please do get back in touch if you have any further questions.

Best wishes

Ben IGSR Helpdesk

ADD COMMENT

Login before adding your answer.

Traffic: 1230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6