Entering edit mode
2.0 years ago
abhisek061 ▴ 30
I have a gene count series matrix I calculated which genes are expressed most with standard deviation calculation but I can not extract only those genes from thousands of extra genes into another csv file.
For reference, one gene has 7 samples I want to extract all highly expressed genes along with its expressed values for different samples.
Dataset is like-
Geneid s1 s2 s3 s4 Standard deviation TEA001 100 45 86 46 50 TEA000 100 45 86 44 49 TEA001 100 47 86 48 49.1
please help I'm a beginner.
is the question technical (== how would you go about of extracting those genes) or biological (== which are the highly expressed genes) ?
for the technical part have a look at the linux utility
awk(many info is available online)
I have studied AWK command sorry, I can't do this with AWK. Could you see the standard division column I want to filter the series matrix based on this row? how could it be possible?
Unless there's some complex computation involved, you most certainly can
Do you wish to get a subset of rows (based on a column) or a subset of columns (based on a row)?
let's say you want to get all genes from all samples that have SD value greater than 49; (assume your file is a tab delimited)
$6represents the 6th column.
Thank you amazing peoples for help me. my problem is now solved with libre office calc.
That's a bad idea. You should be using tools with which you can replicate your analysis. Replication using GUI tools is not easy/straightforward, and automation is near impossible.
Yes, it is working fine. I will follow your suggestion.