Entering edit mode
2.1 years ago
Sacit
•
0
Hello,
I am trying to create a tx2gene table using NCBI annotation file (GCF_016772045.1) - used following codes:
ramb_gtf <- import("genomic.gtf")
txdb <- makeTxDbFromGRanges(ramb_gtf)
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, columns = "TXNAME", keytype = "GENEID")
tx2gene <- df[, 2:1]
head(tx2gene)
I got following tx2gene header:
TXNAME GENEID
1 XM_027978951.2 A1BG
2 XM_004020007.5 A1CF
3 XM_004020008.5 A1CF
4 XM_004006891.4 A2ML1
My salmon quant files looks like:
Name Length EffectiveLength TPM NumReads
NM_001009196.1 409 205.307 0.000000 0.000
NM_001009196.1 409 205.307 0.000000 0.000
NM_001009201.1 1110 905.814 0.000000 0.000
NM_001009202.1 1102 897.814 5428.512703 3.000
NM_001009204.1 201 60.517 0.000000 0.000
Am I using different annotation file or there is some thing different?
PS: I am newbie
You have the code, you have a salmon output. What is your problem ?
tx2gene table have XM_ (predicted) IDs while
quant.sf
files have NM_ (vaidated) IDs. So there is mismatch problem. I want to know the reasonCheck for the intersect between the transcript names. They should be (largely) identical.
Could you please see this comment? I am facing the same:
How do I match my transcript ID's from NCBI to the corresponding gene ID's to enable tximport into R?