Question

Which sequence should I choose?

0

Entering edit mode

21 months ago

David • 0

Hi all,

I've collected sequences from 5 species for 10 different genes. My method was to find the gene RefSeq numbers from my reference genome (Drosophila melanogaster) and type this into the the search bar in genome browser for the other species (other Drosophila species).

This has returned more than one sequence for each gene per species (e.g. if I'm looking for the gene HDAC4 in Drosophila simulans, it returns 3-4 sequences instead of the expected 1) which makes me wonder, which sequence should I pick? Is there an optimal method for doing this, or do you have any advice?

I'd really appreciate any help on this one!

Best wishes,

David

UCSC browser genome sequence alignment • 1.2k views

ADD COMMENT • link 21 months ago by David • 0

score 2 · Accepted Answer · 2022-08-01

2

Entering edit mode

21 months ago

GenoMax 142k

Are you doing these searches at flubase.org?

I see only one sequence (form RefSeq) if I do the search like this at NCBI: https://www.ncbi.nlm.nih.gov/search/all/?term=HDAC4+%5BGENE%5D+AND+Drosophila+simulans+%5BORGN%5D

Drosophila melanogaster on the other hand has 8 known RefSeq transcripts: https://www.ncbi.nlm.nih.gov/search/all/?term=HDAC4+[GENE]+AND+Drosophila+melanogaster+[ORGN]

Are you looking for sequence of gene or transcript? If you want the sequence of the gene then you will need to click on the gene database link to get to that page. From there look for a fasta link in genomic regions section.

D. simulans does not seem to have a similar gene entry. For D. melanogaster: https://www.ncbi.nlm.nih.gov/gene/?term=HDAC4%20%5BGENE%5D%20AND%20Drosophila%20melanogaster%20%5BORGN%5D

ADD COMMENT • link 21 months ago by GenoMax 142k

0

Entering edit mode

Hi GenoMax,

Thanks for the reply!

HDAC4 was a random gene from the top of my head, apologies for the confusion.

I'm looking for the gene, so when I search for this using the melanogaster refseq number in UCSC genome browser I get multiple genes on different chromosomes for, for instance, simulans.

I tried NCBI and the gene database but it seemed to only export the coding regions, and I'm interested in the full sequence. I also want to take 1000 bases upstream and downstream of the gene, which I don't think you can do in the gene database?

Anyway yeah, these a bit far from my original question which is if you have multiple sequences for one gene, how to you decide which to choose? Is there an example of someone doing this? I haven't been able to find anything.

ADD REPLY • link 21 months ago by David • 0

0

Entering edit mode

Unfortunately NCBI and Ensembl only carry the melanogaster genome. So you are going to be limited to UCSC or flybase.org (no longer free I think) for this.

I think your best bet is to grab the genes you need from melanogaster and then identify homologous regions from genome files you can download from UCSC: https://hgdownload.soe.ucsc.edu/downloads.html

ADD REPLY • link 21 months ago by GenoMax 142k

0

Entering edit mode

Yeah this is what I've already done, my question was: if you have multiple sequences for one gene in the same species and in different locations (i.e. different chromosomes), how to you decide which to choose?

ADD REPLY • link 21 months ago by David • 0

1

Entering edit mode

You will have to do a careful analysis of them by doing sequence alignments to make sure they are real orthologs/paralogs. Other drosophila genomes are probably nowhere near complete as melanogaster and that can prove a challenge.

What is the ultimate goal here? Has this analysis not been done by other fly people over the years?

ADD REPLY • link 21 months ago by GenoMax 142k

0

Entering edit mode

Ok that make sense, this is what I thought but I wasn't sure if there was a more automated method. Yes you're right, but they seem to align fairly well.

We're looking at conservation in promotor and intronic regions of a particular set of genes regulated by a particular DNA binding protein. The data are supplementary to some expression observations we've made - I hope this makes sense.

ADD REPLY • link 21 months ago by David • 0

1

Entering edit mode

If they align well then that should make your job a bit easier. Go for hits on the longest contigs since those are likely to be of good quality than hits to small contigs.