Question: RNA-seq Data Excel Formatting
gravatar for michaell
15 months ago by
michaell10 wrote:

I currently have 2 Excel files, one a list of 1000+ genes of interest, and one of analyzed RNA-seq data with read counts. I am trying to match the genes of interest to their corresponding read count; however, it is not feasible to individually copy and paste these values as the spreadsheet is over 25,000+ genes long.

Does anyone know of an effective method to isolate the expression data of my genes of interest off the large RNA-seq spreadsheet?

Thank you in advance.

sequencing rna-seq • 1.1k views
ADD COMMENTlink modified 15 months ago by Kevin Blighe56k • written 15 months ago by michaell10

two ways:

  1. Vlookup function in excel
  2. Load excel sheets in msaccess, write SQL
ADD REPLYlink written 15 months ago by cpad011212k

You can use any simple text editor to get your genes of interest in this format goi_1|goi_2|goi_3 etc. Next, try grep like this

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt

You can also save the output as

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt > output.txt
ADD REPLYlink modified 15 months ago • written 15 months ago by Satyajeet Khare1.5k

I would recommend moving to linux cdmline ;-)

save your excel lists as tab-delineated text files. Then do

grep -f <gene_list> <count_list> > new_output.txt

this will grep for each line from your <gene_list> file the corresponding line from your <count_list> and write it to a new output file.

How did you actually end up with these data files in excel? (I mean, you might already have the files in a text-based format? )

ADD REPLYlink modified 15 months ago • written 15 months ago by lieven.sterck7.2k
gravatar for Kevin Blighe
15 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

michaell, if you are in any way interested in data analysis / bioinformatics, you should take the opportunity to learn some coding given the situation that you face.

  1. Export your Excel® sheets to CSV or TSV format. Be wary that gene names beginning with 'SEPT', 'DEC', etc will likely have been automatically converted to Excel® date format.
  2. Input the exported data to R Programming Language (available for Windows, Mac OS, and Linux)
  3. Perform the filter operations in R
  4. Export the filtered data back to TSV or CSV from R (optional)

In our 'profession', usage of Excel® for data analysis is not recommended. Excel® is a very powerful program for performing other tasks, but it's strong point is not in manipulating large datasets, nor for performing statistics.

If you have absolutely no experience with R, then take a look at my (and my former colleagues') own tutorial notes, which covers everything that you need to complete your task:

If you do this, then you can impress your friends and say that you did some coding.

Good luck,


ADD COMMENTlink modified 15 months ago • written 15 months ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour