Question: RNA-seq Data Excel Formatting
1
gravatar for michaell
6 months ago by
michaell10
michaell10 wrote:

I currently have 2 Excel files, one a list of 1000+ genes of interest, and one of analyzed RNA-seq data with read counts. I am trying to match the genes of interest to their corresponding read count; however, it is not feasible to individually copy and paste these values as the spreadsheet is over 25,000+ genes long.

Does anyone know of an effective method to isolate the expression data of my genes of interest off the large RNA-seq spreadsheet?

Thank you in advance.

sequencing rna-seq • 529 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe45k • written 6 months ago by michaell10
2

two ways:

  1. Vlookup function in excel
  2. Load excel sheets in msaccess, write SQL
ADD REPLYlink written 6 months ago by cpad011211k

You can use any simple text editor to get your genes of interest in this format goi_1|goi_2|goi_3 etc. Next, try grep like this

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt

You can also save the output as

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt > output.txt
ADD REPLYlink modified 6 months ago • written 6 months ago by Satyajeet Khare1.4k

I would recommend moving to linux cdmline ;-)

save your excel lists as tab-delineated text files. Then do

grep -f <gene_list> <count_list> > new_output.txt

this will grep for each line from your <gene_list> file the corresponding line from your <count_list> and write it to a new output file.

How did you actually end up with these data files in excel? (I mean, you might already have the files in a text-based format? )

ADD REPLYlink modified 6 months ago • written 6 months ago by lieven.sterck5.5k
3
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe45k
Kevin Blighe45k wrote:

michaell, if you are in any way interested in data analysis / bioinformatics, you should take the opportunity to learn some coding given the situation that you face.

  1. Export your Excel® sheets to CSV or TSV format. Be wary that gene names beginning with 'SEPT', 'DEC', etc will likely have been automatically converted to Excel® date format.
  2. Input the exported data to R Programming Language (available for Windows, Mac OS, and Linux)
  3. Perform the filter operations in R
  4. Export the filtered data back to TSV or CSV from R (optional)

In our 'profession', usage of Excel® for data analysis is not recommended. Excel® is a very powerful program for performing other tasks, but it's strong point is not in manipulating large datasets, nor for performing statistics.

If you have absolutely no experience with R, then take a look at my (and my former colleagues') own tutorial notes, which covers everything that you need to complete your task:

If you do this, then you can impress your friends and say that you did some coding.

Good luck,

Kevin

ADD COMMENTlink modified 6 months ago • written 6 months ago by Kevin Blighe45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 878 users visited in the last hour