Moderator: Ketil

gravatar for Ketil
Ketil3.8k
Reputation:
3,820
Status:
Trusted
Location:
Germany
Website:
http://blog.malde.org/
Last seen:
6 months, 1 week ago
Joined:
6 years, 6 months ago
Email:
k****@malde.org

Posts by Ketil

<prev • 274 results • page 1 of 28 • next >
0
votes
3
answers
567
views
3
answers
Answer: A: The number of cluster in Kmean clustering
... There's an interesting modification to k-means where instead of setting the clusters explicitly, you minimize the expression $\sum || x_i - \mu_i ||^2 + \sum || \mu_i - \mu_j ||$ (IIRC). The $\mu$s represent cluster centroids, and the minimization forces them to be as few as possible, while ...
written 6 months ago by Ketil3.8k
2
votes
4
answers
386
views
4
answers
Answer: A: How to print the lines exclusively from unknown columns with missing(undef) valu
... > I wanna print just the columns in which IDs present missing ("null" or "undefined") values, i.e. they're blank. If this is actually what you want, you could identify columns with blanks, something like: `for i in {1..10}; do cut -f$i < file | grep -q '^$' || echo $i; done` and then to pr ...
written 6 months ago by Ketil3.8k
0
votes
4
answers
7.9k
views
4
answers
Comment: C: Is There A Fast Hashing Function For Nucleotide K-Mers (Q-Grams)?
... Still here? :-) Yes, I maintain the current hash in forward and reverse complement, shift left/right and chop off the end, add next base, and store the numerically smallest hash. Code at [https://github.com/ketil-malde/kmx][1]. [1]: https://github.com/ketil-malde/kmx ...
written 6 months ago by Ketil3.8k
0
votes
2
answers
388
views
2
answers
Comment: C: "No space left on device" after "Finished constructing BWT"
... I wouldn't trust them, bwa is perhaps more robust than your typical bioinformatics program, but chances are one or more of the files are incomplete, and that further processing will give you incomplete or wrong results - or if you are lucky, you will get an error. ...
written 6 months ago by Ketil3.8k
0
votes
1
answer
957
views
1
answers
Comment: C: sam errors generated from bwa mem
... Seems like a bug in bwa?  I've had a ton of trouble after having an error in the reference file (two contigs were concatenated), this wasn't properly picked up by any of the tools I used, and produced corrupt/incorrect output.  I can only suggest that you double-check all input files, and if nothing ...
written 3.2 years ago by Ketil3.8k
0
votes
4
answers
7.9k
views
4
answers
Comment: C: Is There A Fast Hashing Function For Nucleotide K-Mers (Q-Grams)?
... I did an implementation of this in Haskell - normally, I'd expect a high level language to be less efficient for this, but it turns out it is fast enough (meaning that I haven't found an associative data structure that won't be dramatically slower than the hashing). I can hash about 40MB/s on my la ...
written 3.8 years ago by Ketil3.8k
0
votes
5
answers
6.9k
views
5
answers
Comment: C: A Question About Hybrid Assembly
... Euler is just an early de Bruijn assembler, in principle, it is the same as ALLPATHS, Velvet, and Abyss. ...
written 3.8 years ago by Ketil3.8k
3
votes
4
answers
40k
views
4
answers
Comment: C: What Does Samtools Flagstat Results Mean?
... I'm pretty sure 'total' is the total number of alignments (lines in the sam file), not total reads. ...
written 3.8 years ago by Ketil3.8k
1
vote
1
answer
2.2k
views
1
answers
Answer: A: K-Mer Counting And Constructing Bwt Index In String Graph Assembler (Sga)
... Although I'm not an expert on this, constructing the BWT is essentially the same as constructing a suffix array. Since there are linear time algorithms for this, I don't think repetitive regions will necessarily be slower. But it is possible that there are other algorithms that are used because th ...
written 3.8 years ago by Ketil3.8k
1
vote
1
answer
2.1k
views
1
answers
Comment: C: Sspace And -K Parameter
... I think this must be correct, and that the graph is pruned of less-than-k edges before the difference in weight between links (-a parameter) is considered. Generally, I think it is better to keep k low (just filter out noise from mismapped reads), and use -a for stringency. ...
written 4.6 years ago by Ketil3.8k

Latest awards to Ketil

Great Question 3.1 years ago, created a question with more than 5,000 views. For Selecting Random Pairs From Fastq?
Commentator 3.1 years ago, created a comment with at least 3 up-votes. For C: How To Convert (Aligned) Text File Into An Alignment File?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Oligo Design From Ests
Prophet 3.1 years ago, created a post with more than 20 followers. For Selecting Random Pairs From Fastq?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Lua For Bioinformatics?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For What Assembler To Use For Eukaryotes?
Commentator 3.1 years ago, created a comment with at least 3 up-votes. For C: Ngs - Huge (Fastq) File Parsing - Which Language For Good Efficiency ?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Adapter/Linker/Primer Sequence Database?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Selecting Random Pairs From Fastq?
Epic Question 3.1 years ago, created a question with more than 10,000 views. For Selecting Random Pairs From Fastq?
Appreciated 3.1 years ago, created a post with more than 5 votes. For A: What'S The Best Generic Scripting Tool For Bioinformatics?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Alternative To "Samtools.Pl Pileup2Fq" For Consensus Generation?
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Microarrays And Gene Regulation
Popular Question 3.1 years ago, created a question with more than 1,000 views. For Estimating Probability Of Differing Allele Frequencies From Pooled Samples
Teacher 3.1 years ago, created an answer with at least 3 up-votes. For A: What Additional Computer Science Courses Should I Do As A Bioinformatician
Teacher 3.1 years ago, created an answer with at least 3 up-votes. For A: [Discussion] Parsing Fasta Without Bioperl
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: Run Hundreds Of Bwa Commands Without Waiting
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: What Are The Most Common Stupid Mistakes In Bioinformatics?
Student 3.5 years ago, asked a question with at least 3 up-votes. For Selecting Random Pairs From Fastq?
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: How To Best Deal With Adapter Contamination (Illumina)?
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: Scalemp Vsmp Or Physical Ram
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: Why Are Sam/Bam Files So Large?
Student 3.5 years ago, asked a question with at least 3 up-votes. For What Is A Good Web Front End For (Blast) Homology Search?
Student 3.5 years ago, asked a question with at least 3 up-votes. For What Assembler To Use For Eukaryotes?
Teacher 3.5 years ago, created an answer with at least 3 up-votes. For A: Evaluation Of High Throughput Sequencing Error Rates ?

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1089 users visited in the last hour