I am using Galaxy for the NGS analysis. When I use tools like featureCounts, DESeq, etc., it gives output file in .tabular format. I need to see this data in a well-organized table in Microsoft Excel (xlsx) format. Could anyone help, please?
I am using Galaxy for the NGS analysis. When I use tools like featureCounts, DESeq, etc., it gives output file in .tabular format. I need to see this data in a well-organized table in Microsoft Excel (xlsx) format. Could anyone help, please?
Hello Glory Basumata ,
first of all: You don't realy want to do this. It is very likely, that at a later points you will convert it back to something, that a too, which you like to use for further analysis, needs and then the nightmare begins ...
If you don't take care of my warning than you simply can open those files in excel. It will ask you how to convert and you have to select that those file are tab delimited. In the next step you can choose what is the decimal seperator.
But you have to be aware that excel also might try to convert some values to something it believe it could be. So e.g. a gene called SEP09
can become 01.09.2009
.
So again: DON'T DO THIS!
fin swimmer
Hi finswimmer,
Thank you for the warning. I appreciate your concern. I understand that excel is not the right tool for NGS study. My main intention to open/visualize tabular file is to check the DESeq result and filter out the differentially expressed gene. I also want to filter out the top 10 or 20 or 50 upregulated and downregulated genes. Furthermore, I would like to do GO (Gene Ontology) enrichment using GOseq. Kindly suggest a tool to study such tabular files. Thank you.
You can use any tool that treats the file as a plain text file and the data in it as text. Excel does not do the latter implicitly. Read this paper to see a few examples of what can happen.
Hi,
here is an python script which takes as first argument TSV and second argument creates xlsx file.
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf8')
from xlsxwriter.workbook import Workbook
# Add some command-line logic to read the file names.
tsv_file = sys.argv[1]
xlsx_file = sys.argv[2]
# Create an XlsxWriter workbook object and add a worksheet.
workbook = Workbook(xlsx_file)
worksheet = workbook.add_worksheet()
# Create a TSV file reader.
tsv_reader = csv.reader(open(tsv_file, 'rb'), delimiter='\t')
# Read the row data from the TSV file and write it to the XLSX file.
for row, data in enumerate(tsv_reader):
worksheet.write_row(row, 0, data)
# Close the XLSX file.
workbook.close()
You can save this code to file as conversion.py and make executable:
chmod +x conversion.py
Then run it as:
./conversion.py input.tsv output.xlsx
Have you tried just opening the output in Excel? It's likely just tab delimited text, Excel shouldn't have an issue with that. Just pay attention to what Excel does to your gene names.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
try importing it just the way you import a tab separated file in excel and also post few lines from example file, from next time Glory Basumata
Hi all, Thanks for the instructions. I will try this.
@Pierre: I like the cartoon icon :) Will try opening the file in R instead.
Look at this post to see a live example of what Excel does to some gene names.
Hi All, Thank you for your responses to my question. I am sharing the DESeq2 result output file so that you could try opening it. I am trying to filter out top 10 to 50 genes which are either upregulated or downregulated for this file.
Link to file: https://drive.google.com/file/d/1ItlSqE8jH645z_REd2gc_oVJo6_CaHPf/view?usp=sharing
Hello Glory Basumata ,
you should better start a new thread for this as it doesn't fit to your original question. But before doing it, try to solve this by yourself as this is a trivial task with unix commands
sort
andhead
ortail
.fin swimmer
Hey can you please tell me how did you open the .tabular file in R or excel?
What have you tried? Did you check with a simple
read.table
?