User: severalorks

gravatar for severalorks
severalorks110
Reputation:
110
Status:
Trusted
Location:
Last seen:
4 years, 4 months ago
Joined:
4 years, 5 months ago
Email:
m********@gmail.com

Posts by severalorks

<prev • 29 results • page 1 of 3 • next >
0
votes
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... In this thread: https://www.biostars.org/p/121643/ You linked to a program called vcf2fasta.cpp. Would that do what I need? EDIT: I wrote a simple python script that gets phased fasta from vcf for 1 individual, and it takes 5 min to run, so if I run it in parallel for all individuals I should get ...
written 4.3 years ago by severalorks110
0
votes
0
answers
982
views
0
answers
Finding diploid neanderthal alignment data, looking for ambiguity codes
... I'm using Neanderthal alignment data from here: http://www.eva.mpg.de/neandertal/draft-neandertal-genome/data.html Specifically, the .bam files. I would like to find alignments for neanderthal sequences to human genomes that state ambiguities (such as M or R). However, the data from that site does ...
genome alignment ambiguity-codes neanderthal written 4.3 years ago by severalorks110
0
votes
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... In the phased vcf file it gives the information for each diploid chromosome for each individual. For example, for column HG00096, it has 0|0, or 0|1, etc., where 1 indicates the chromosome has the alternative, while 0 means it has the reference SNP. Does the pyfaidx FastaVariant object only create i ...
written 4.3 years ago by severalorks110
1
vote
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... Awesome!! I just ran it in parallel on the sun grid engine from 0 to a million and it all finished in less than 2 minutes!! I ran the whole thing for 1 individual... finished in around 3-5 minutes. There were around 600 jobs. Next time I can use only 60 jobs and it'll probably finish in 30 min, an ...
written 4.3 years ago by severalorks110
1
vote
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... Maybe it would be faster if I was to take chunks of it, says str(consensus[0:10000]) + str(consensus[10000:20000]), and add them together. That way it sort of 'parallelizes' it. EDIT: len(str(consensus['20'][100000:110000])) takes around 12 seconds len(str(consensus['20'][200000:210000])) takes aro ...
written 4.3 years ago by severalorks110
0
votes
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... Actually 100000 takes 30 seconds, while 500000 takes 7 minutes, so 6 hrs is far off. I've tracked how much the time increases for each increment i of 10000 for consensus[0:i]. It might take a day to run, I will see. ...
written 4.3 years ago by severalorks110
1
vote
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... I've tried the option for 1000genomes, along with many other options (at least 6, spending hours on each option) but the vcf-subset doesn't work for me as it gives me 'Broken VCF header- no columns names?', and Data Slicer crashes (it says page taking too long to respond) when I try to use it to get ...
written 4.3 years ago by severalorks110
1
vote
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... Thanks! It took 33 seconds for str(consensus['20'][0:100000]) to finish running, so I estimate it'll take 6 hours for str(consensus['20']) to finish running. ...
written 4.3 years ago by severalorks110
0
votes
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... Also, just to understand how the code works better, does this give the same DNA sequence for each individual as what you wrote above? It hasn't finished running yet so I don't know: from pyfaidx import FastaVariant import vcf samples = vcf.Reader(open('calls.vcf.gz', 'r')).samples ...
written 4.3 years ago by severalorks110
0
votes
1
answer
2.2k
views
1
answers
Comment: C: Does vcftools or GATK have an option to skip indels when making alternate sequen
... What's the estimated run time for one individual? I've ran it for around 30 min and it hasn't given output yet for the first individual, HG00096. It's created the file but it's empty so far. I can continue running it for several more hours to see what happens, though I was wondering if I might be do ...
written 4.3 years ago by severalorks110

Latest awards to severalorks

No awards yet. Soon to come :-)

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 967 users visited in the last hour