#### Posts by ethan.kaufman

Comment: C: What means Gaussian Kernel?
... As h.mon pointed out, this is the wrong forum to post this question.  Try stats.exchange instead, or (assuming this is for a course assignment) ask your professor or others in your class. ...
written 4.1 years ago by ethan.kaufman360
... Gaussian is just another name for the familiar Normal probability distribution.  Any probability distribution can be described by a density function (aka PDF), which maps events to probabilities.  In this context, the kernel refers to the part(s) of the PDF that is dependent on the variables in the ...
written 4.1 years ago by ethan.kaufman360
... If you're into python, pyvcf makes this sort of task quite easy. ...
written 4.1 years ago by ethan.kaufman360
... Note that testing the null hypothesis of correlation = 0 is statistically equivalent to testing slope = 0 in the corresponding linear regression model.  Since any stats package will give you this P-value as part of its output for fitting a linear model, this might be an easier way to get what you're ...
written 4.2 years ago by ethan.kaufman360
... Looks like you got a C from mum and a T from dad, or vice versa. ...
written 4.2 years ago by ethan.kaufman360
... Correlation is a pairwise measure.  You can calculate correlation between two samples (by considering each gene as an independent observation) but not between n samples.  To get a sense of the overall concordance of your dataset, I would calculate all pairwise correlations, which would generate a sy ...
written 4.2 years ago by ethan.kaufman360
... That should be a good enough approximation, yes, assuming all the mutations called are within the exome regions.  Minor point: the length of each region is END-START+1  ...
written 4.2 years ago by ethan.kaufman360
... In your case you would use the exome length.  If you have a bed file for the captured regions, then this should be pretty easy.  If not, you can compute the genome coverage from the bam file with bedtools and then add up the regions that have depth above a minimum threshold.   ...
written 4.2 years ago by ethan.kaufman360
... Frequency is just a count.  It is usually normalized to some fixed unit of time or space to enable comparison with other counts.  Mutation frequency can conceivably refer to many things depending on the context: Number of mutations per sample/per Mb/per gene, etc Number of samples in which a par ...
written 4.2 years ago by ethan.kaufman360
... Note that cufflinks allows you to supply your own custom GTF when run in 'ref-only' or 'ref-guided' mode.   ...
written 4.3 years ago by ethan.kaufman360

