Question

Functional analysis of my Gene regulatory network

2

Entering edit mode

6.4 years ago

The Last Word ▴ 230

Hi,

I know that there have been multiple suggestions for getting gene ids. I have used a method to do it myself and this is just a post to see if I am doing it right. I have the oligonucleotide probe ids for a set of transcription factors and genes which are part of the Apis Mellifera gene regulatory network. What I did was:

1) I retrieved the sequences for the probes.

2) I further went on NCBI and used BLAST to match these probe sequences with the honey bee gene dataset.

3) Further, I extracted the ENTREZ IDs of all the hits in an excel sheet.

4) I converted the ENTREZ IDs for all the hits into UNIPROT IDs on UNIPROT.

5) I further, used these UNIPROT IDs to do a functional annotation analysis on DAVID.

Please advice if the methodology I followed was correct and if not, please suggest any changes that I should make to it.

Edit: I understand that the NCBI ids are also present in the file, so I can make do without sequences, however, when I converted these ENTREZ ids to UNIPROT ids, only about 126 of a list of 10,000 got converted which is far too few for my use, which is why I am forced to try this method. Tried converting the ids on DAVID but it is not recognizing my ids This is the sequence of the probes

GRN functional annotation BLAST • 1.8k views

ADD COMMENT • link updated 6.4 years ago by Kevin Blighe 87k • written 6.4 years ago by The Last Word ▴ 230

1

Entering edit mode

I have reservations about your steps 2 and 3. When running BLAST, how did you eventually choose the ENTREZ ID that matched each sequence? In many situations, there must surely have been multiple matches, so, how did you choose the correct match? You should also avoid the use of Excel as an intermediary step as best as you can. Excel assumes that the user is not expert and makes many assumptions with regard to what it should do with your data. Unless you know Excel inside out, try best to avoid it.

If you have probe IDs, then I assume that these IDs relate to an Affymetrix, Illumina, or Agilent microarray? There must, therefore, be an annotation file (possibly called a 'CDF') which will provide mappings for probes to gene names.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Thx for the reply. I did a MegaBlast and the hits being subjected to just one species (Apis Mellifera), the hits are mostly just one. In case there is more than one hit, they are all variants of the same gene. These are the list of probe ids. As, you can see, the nucleotide ids are provided as well. However, when I converted the nucleotide ids into Uniprot ids for DAVID, Only 126 out of a total of 10,000 genes got converted which is very few for my purpose. I will be vary of using excel and probably use a text file to store my ids. Thank you for that suggestion. Is there another way out of the first predicament?

ADD REPLY • link 6.4 years ago by The Last Word ▴ 230

0

Entering edit mode

In the file to which you linked ( this ), there is already a fairly good mapping for probe ID to RefSeq ID (the NM and XM IDs). I would just use those for DAVID.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Tried but DAVID gives the error "You are either not sure which identifier type your list contains, or less than 80% of your list has mapped to your chosen identifier type." Gave the id option as Refseq_mRNA and also tried with the option ENTREZ_ID

ADD REPLY • link 6.4 years ago by The Last Word ▴ 230

0

Entering edit mode

You may have to remove the suffix that indicates the transcript isoform, that is:

NM_12345.1 should be NM_12345

If you save the listing as a text file in linux or MAC OS, you can remove this easily with sed:

cat temp 
M00351  NM_001011642.1
AM00352 NM_001011613.1
AM00353 XM_001123211.1
AM00354 NM_001011642.1
AM00355 NM_001011642.1
AM00356 XM_001123211.1
AM00357 NM_001011613.1
AM00437 DB731866
AM00645 XM_391944.3
AM00843 DB774374
AM00966 DB779620
AM01009 DB754287
AM01057 BI502939
AM01058 BI502941
AM01172 DB762253
AM01340 BI505673
AM01355 DB779829
AM01365 BI505951
AM01507 NW_001253016.1

cut -f2 temp | sed 's/\.[0-9]*$//g'
NM_001011642
NM_001011613
XM_001123211
NM_001011642
NM_001011642
XM_001123211
NM_001011613
DB731866
XM_391944
DB774374
DB779620
DB754287
BI502939
BI502941
DB762253
BI505673
DB779829
BI505951
NW_001253016

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin. I did a DAVID analysis setting the parameter as Refseq_mRNA. Eventhough the matches are very few (out of a list of 360, only 123 of them gave some kind of functional annotation result on DAVID), I guess that will have to do. if you know of other functional annotation tools that would be better than DAVID, please do let me know. If you put up a brief answer, I could select that as an answer and close this question.

ADD REPLY • link 6.4 years ago by The Last Word ▴ 230