How to convert between gene ontology file formats
2
0
Entering edit mode
8.8 years ago
dgimode ▴ 70

Hi,

I have done annotation on a denovo transcriptome for which I want to do GO term enrichment. Here's the format for the list of GO terms I get:

TR17386    GO:0005737    GO:0005634
TR27677    GO:0005737    GO:0005524    GO:0006457
TR24529    GO:0005737    GO:0004332    GO:0006096

But I went them in a list with just 2 columns like this:

TR17386    GO:0005737
TR17386    GO:0005634
TR27677    GO:0005737
TR27677    GO:0005524
TR27677    GO:0006457
TR27677    GO:0006950
TR24529    GO:0005737
TR24529    GO:0004332
TR24529    GO:0006096

Is there a script that can help me achieve this?

Thanks,
Davis

RNA-Seq • 2.7k views
ADD COMMENT
4
Entering edit mode
8.8 years ago
Fidel ★ 2.0k

This should work:

perl -lane '$id = shift(@F); foreach $GO (@F) {print "$id\t$GO"} ' your_file.tab
ADD COMMENT
0
Entering edit mode

Excellent. Thanks for the solutions. The perl one liner worked perfectly. Thank you

ADD REPLY
0
Entering edit mode

Cool. If the solution worked for you, then accept it ;)

ADD REPLY
1
Entering edit mode
8.8 years ago
PoGibas 5.1k
# Solution using R

# Packages
# You might need to install them
library(data.table)
library(reshape2)

# Read in
df <- read.table("original.txt", header=FALSE, fill=TRUE)

# Format as wanted
res <- melt(df, id="V1")

# Remove empty row
setDT(res)
res <- res[value != "", list(V1, value)]

# Save as a text file
write.table(res, "wanted.txt", col.names=FALSE, row.names=FALSE, quote=FALSE)
ADD COMMENT

Login before adding your answer.

Traffic: 2200 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6