Question: count matrix of genes
0
gravatar for smrutimayipanda
4 months ago by
smrutimayipanda10 wrote:

I have a set of files containing the information about logFC, gene names, etc. I want to create count matrix of genes in which I want to compare the gene column in one file with other gene column in another file and print the logFC values and filename according to sample filename. Like Sample name 1 Sample name 2 Genename log FC value log FC value

How to write a code in bash using awk? Thanks in advance.

microarray • 170 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by smrutimayipanda10

You're describing logFC matrix, not count matrix. I think the easiest way would be python/R (or another favorite language). Although awk is a valid programing language it's usually used for short manipulations.

ADD REPLYlink written 4 months ago by Asaf8.5k

yeah but I am new to python so more comfortable in bash. Can you please give some tips in bash to create matrix?

ADD REPLYlink written 4 months ago by smrutimayipanda10

There is no tip that will allow you to convert fold changes to counts.

ADD REPLYlink written 4 months ago by swbarnes29.2k

There's a reason pandas was born. If it was intuitive to represent matrices in bash (or even plain python) there was no need for it.

ADD REPLYlink written 4 months ago by Asaf8.5k

not fold changes to counts but i need to create a matrix where gene name should be there in horizontal and sample name should be in vertical and logFC should be assigned with their respective gene names as well as sample names. So i need to write a script in bash

ADD REPLYlink written 4 months ago by smrutimayipanda10

Thank you but I have used this commands. It cant be used for multiple files like 25 or 30 files. You need something to perform it more efficiently. I have tried it also but not useful for multiple files

ADD REPLYlink written 4 months ago by smrutimayipanda10
0
gravatar for Shalu Jhanwar
4 months ago by
Shalu Jhanwar470
Switzerland
Shalu Jhanwar470 wrote:

You can generate logFC matrix from different files using "paste" and "cut" commands. E.g. if the files are two-column tab-delimited format like below:

File1

g1 0.4

g2 0.6

g3 0.9

File2

g1 2.4

g2 3.0

g3 5.0

The command will generate below file:

paste File1 File2 | cut -f1,2,4 > File3

cat File3

g1 0.4 2.4

g2 0.6 3.0

g3 0.9 5.0

After generating the files, you can insert the header (sample names) using 'sed'

sed -i 1i"geneName\tFile1\tFile2" File3

You can perform these operations on multiple files.

ADD COMMENTlink written 4 months ago by Shalu Jhanwar470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1821 users visited in the last hour