Curious Very Long Snp In 1Kg Data
1
0
Entering edit mode
10.5 years ago

From the 20. chromosome of the finnish samples in 1000 genomes. What is this monster string supposed to be? There are several others like it so it is unlikely to be a mistake.

20    348416    esv2677012    CCTAAGCCCTCCCCACAGCTACCACCCTATTTTTTCTCCCCTTTGCAGAAAAGGGCTTTGAGAAAATTGTCTATCCTCGCTGTTTTTAATTAGTCTTCTCTCTCTCTCTCCCTCTGAGACAGGATCTGCTCTCTCACCCAAGCTGGAGTGCAGTGGCGTGATCATGGCTCACTGCAGCCTCAACCTCCTGGGCTCAAACGATCTTCCCACCTCAGCCTCCTGAGTAGCTGGGACTACAGGTGTGCACTACCATGCCTGGCTAATTTTTGTATTTTTTGTAGAGACTGGGTTTTGCCATGTTGCCCAGGCTGGTTTTGAACTCCCAGGCTCAAGTGATCCATCCACCTCAGCCTCCCAAAGTGCTGGGACTGCAGGTGTGAGCCACCACACCTGGCCCTCTTGTCTCTTAAGTCCATTTAATCATGCTTCTACCTGTCACTTCCCTAGTTGAAACTGCTCTTGTCAATTTCAACACATTGCTAAATCCAATGTGTTCAGTTCTCATTCTTCATCTTTTTTTTTTTTTTTTTTTGAGACAGAGTCTTGCTCTGTCACCCAGGCTGGAATACAGTGGCACGATCTTGGCCCACTGCAACCTCTGCCTCCTGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGTTGGGACTACAGGCACAAGCCACCAAACCCAGCTAATTTTTGTATTTTTAGTTGAGACGGCATTTCACCATGTTGGCCAGGATGGTCTCAATCTCTTGACCTCGTGATCCGCCCACCTTGGCCTCCAAAAGTGCTGGGATTACAGGTGTGAGCCACCGCACCCGGCCCATTCTTCATCTTCTTAACTGATCAACAGTTTGACACAGCTGACCACTCCCTGCTCTTTGATGTACTTCTTTTCACTTGGTGGCCAGGCCTCCACTCTCTGCTGGTTTTCCTCCTTCTCAGGCTCCCTGCTTCTCCCATTCCTGTTGGAGCAGTGAGGACTTGGTCCCTGGAGCTCTCATCCAGTCTCACGTCTATGACTCCCAACACTGTATCCTCAGCCCAGACCTCTCCCCTGAACTCCAGCCCATACATTCAAATACCTACCTGATGTCTCTTTGAGGATGTCAAAAGACATGACAGACTCCACAGAACCAAAGCTGAACCTGGGCTTCCCCCAAACACCTCGCTCCATGTCATTTGATGGCAGTTCCATACCTGTCACCGTTCAGGCCAAGAAACCTTGGAAGCACCTTGACACCTCCTTTTCCCTCAAACTCCACATCTAGACCATCAGCAATCCTGTTGGCTCCACCTTTAAAATATACCCAGAATCCAGTCACAGCTCACCTCTAGCATGGCCACTGCCCTGCTCTGAGCCACTGGAGTTTAAGAGAATTATTGCAACACCTGCTCCCTTGTCTTCCTGTCCTTGCCTCATTCAGTCTATTCCAAGTACAGATCCCTAAATGATTTTATTTTAAAAGTAAGTCAAGGCTGGGCATGGTAGCTCATGCCTGTAATCCTAGCGCTTGAGGAGGCCGAGGAAGGAGGATCACTTGGGTGTAGGAGTTTGAGACCCACCTGGGCAATGTGGCAAAACCCTGTCTGTACTTAAAAAAAAGAAAAAAAATGGCTGGGCATGGTGGCTCACCCTGTAATCTTAGCACTTTGGGAGGCTGAGGCGGGTGAATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGATGAAACCCCATCTCTACTAAAAATACAAAAATTAGCCGGGCAAGGTGATGCACGCCTGTAGTCCCAGCTACTCAGGAGGCTAAGGAAGAAGAATCACTGGAACCCAGGAGGTGGAGGTTGCAGTGAGCCAAGATCGCGCCACTGCACTCCAGCCTGCATGACAGGAGCGAGACTCCATCTCAAAAAAAAAAAAAAAAAAAAAAAGGTAAGTGAGATCACTTCCCTCCTCTCCTTAAACCCTCCCCTGCCTCCCCATGACTCCTCAGCGTCCTTTCAAAGGCCTCCAAAGCTCCAGATTATCTGAACCCCCTTTACCTCTCTGACCTCATCTCCCACCGCCTCCCTGTCACTGGCTGCACTCCAGCCACATTGACCTTCTCCGATGGCACACCAGTCAGCTAGTCAGCTTCCTTTTGGAGCTTTTGCATGAGCTGTTCCTCTTCCTGAAGAATTTGCCCTTCGGATAGTCTCAGGGCATCCACTGAACACTCCACTCAATACAGCCACTGCCTGCCCACCCAACACTCCTCATCTCTGTACTTACTCTTTTTTTCCCTTGCATTCGTCACCCCCTAACATGTGCTACAATGTACTTATTATGGTAATTATTTCTTGCATGTTTCTTTCTTTTTTTTTTTGAGACAGGGTCTCACTCTGTTGCCCAGTCTGGAGTGCAGCAGCATGATCTCAGCTCACTGAAATCTTGGCCTACCTGGCTCAGGCCATCCTCCCTCCTCTGCCTCCTGAGTAGCTGGGACTACAGGCACTCACCACCATGCCTGGCTAGTTGTTGTACTTTTTTGTAGAGATGAGGTTTCACCATGTTGCCTAAGCTAGTCTAAAACTCCTAGGCTCAAGTGATCCTCCCGCCTCAGCCTCCCGAAGTACTGGGATTGCGGGTGTGAGCCGCTGTGCCTGGCTGCACTTTTCCTTCTAATGGAATGTAAGCGCCACTTTTGTCTGTTATTTTCA    C    .    PASS    AC=0;AF=0.0032;AFR_AF=0.01;AN=186;AVGPOST=0.9985;CIEND=-67,86;CIPOS=-69,88;END=351092;ERATE=0.0004;EUR_AF=0.0013;HOMLEN=2;HOMSEQ=CT;LDAF=0.0032;RSQ=0.8232;SVLEN=-2676;SVTYPE=DEL;THETA=0.0246;VT=SV    GT:DS:GL    0|0:0.000:-0.00,-3.63,-71.20    0|0:0.000:-0.01,-1.70,-50.53 ....
• 2.7k views
ADD COMMENT
1
Entering edit mode

It does say SVTYPE=DEL, so...

ADD REPLY
0
Entering edit mode

But other snps look like: 20 3484167 rs143524500 A C ..., why such a long string?

ADD REPLY
1
Entering edit mode

why not ?

ADD REPLY
0
Entering edit mode

Because a SNP is one nucleotide (at least in my world)

ADD REPLY
2
Entering edit mode

no :-)

~$ curl -s "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz" | gunzip -c | cut -f 1-5 | grep -v ',' | awk '(length($5)>100)'
1    2017221    rs70940712    C    CATGAGGTACCGTGCTGGAGGGGCTGAGGACGTCGGGGGGCCCTGTTCTCAGAGCCCTTGAGGCACCGTGCTGGAGGGGCTGAGGACGTCGGGGGGCCCTGTTCTCAGAGCCC
1    3611140    rs74221234    G    GCCCTGCAGCCTCCGCCCCTCCTCCCGCAATCCCAGCCCTGCAGCCTCCGCCCCTCCTCCCGCAATCCCAGCCCTGCAGCCTCAGCCCCTCCTCCTGCAATCCCAC
ADD REPLY
1
Entering edit mode

And the example in your original post is considered a SNP in your world? I think not.

Edit: In other words, reread and then understand my first comment and you'll know why this is the way it is :)

ADD REPLY
0
Entering edit mode

Okay, so these files aren't just SNPs, but also structural variation. Write that comment up as a (short) answer and I'll accept and upvote. Ps. I still do not understand why a deletion is described with such a long string.

ADD REPLY
1
Entering edit mode

Done. The deletion is described as such a long string so it's clear what's actually being deleted. If the line were just 20 348416 esv2677012 CA C or something like that, then it would only be the A that's deleted. In this case, there's a much larger region that's deleted.

ADD REPLY
0
Entering edit mode

Is that labeled as a structural variant?

ADD REPLY
2
Entering edit mode
10.5 years ago

As mentioned in the comments, there are a number of different type of variations contained within that file. Some of these are simple SNPs, which would have much shorter lines. Others are larger structural variations, such as the one in your example, which happens to represent a deletion.

ADD COMMENT

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6