Question

How to deal with multiple to one matching of ids ?

0

Entering edit mode

9.8 years ago

jack ▴ 980

Hi,

I have expression data with ucsc ids.I was converting the ucsc ids to refseq ids. there are multiple to one matching .

I need to work with refseq ids. so, should I sum up the expession levels of them in case of multiple to one matching?

ucsc           Refseq
uc002cie.2    NM_138418
uc002cic.1    NM_138418
uc002cid.1    NM_138418
uc002cif.1    NM_138418
uc002cig.1    NM_145294
uc002cih.1    NM_145294
uc002cik.1    NM_145294
uc002cim.1    NM_145294
uc010uul.1    NM_145294
uc002cii.1    NM_145294
uc002cij.1    NM_145294
uc002cil.1    NM_145294

genomics genome RNA-Seq • 1.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by jack ▴ 980

1

Entering edit mode

I see that you've tagged this RNAseq, but this typically only occurs with microarray data. Is this really RNAseq and, if so, why not just get expression data for the refseq features directly?

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

1

Entering edit mode

this is RNA-seq. i got it from TCGA. so it's not possible to get it in Refseq features.

ADD REPLY • link 9.8 years ago by jack ▴ 980

Ram · Answer 1 · 2014-11-17

1

Entering edit mode

9.8 years ago

Devon Ryan 104k

Ah, TCGA data, that explains it :)

Assuming you're using the "Expected counts" from RSEM that TCGA provides, then just add them up.

ADD COMMENT • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

why I should sum up them ? I'm bit confused. if they are same isoforms, then why they have different ucsc Ids ?

ADD REPLY • link 9.8 years ago by jack ▴ 980

1

Entering edit mode

They're not the same in the UCSC annotation (or Ensembl, if you were using that), just in RefSeq. In UCSC, Fam195A and C16orf14 are different, in RefSeq they're the same.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

I see, but do you know, which annotation is more accurate?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by jack ▴ 980

2

Entering edit mode

My personal order of preference would be:

Ensembl or Gencode
UCSC
RefSeq

If you need refseq for a downstream analysis that depends on it then there's no way around it. As a general principle, try to stick with the original annotation system as much as you can. Converting between the various annotation systems always leads to a bit of increased noise and loss of data.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Devon Ryan 104k