Question: Genomic Coordinate Annotation
0
gravatar for ari_sh70
17 months ago by
ari_sh700
ari_sh700 wrote:

Hi everyone!

I have a question that I would really appreciate it if you help me out, please!

I have a list of data that I would like to know how to find out the gene's names. The dataset which is a text file has the following format:

Segment               Count                           First                    End
 0                     258                           1_1960674            1_2013259
 1                     85                            1_3057480            1_3257840
 2                     185                           1_3340901            1_3783903
 3                     215                           1_209363247          1_209995470

In this dataset, the first column is the number of segment, the second column is representing the number of SNPs per segment, and the third and fourth columns are representing the smallest and largest SNPs number for each segment. I should note that the values of the third and fourth columns are the combination of the chromosome number and its position Now, how can I understand the gene names?

Thank you very much

sequencing snp gene genome • 546 views
ADD COMMENTlink modified 17 months ago by lieven.sterck8.7k • written 17 months ago by ari_sh700

Hi ari_sh70 , I've changed the 'tag' of your post to Question as the 'Tutorial' one is reserved for tutorials where people explain or showcase the use of a tool or pipeline.

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

dear ari_sh70

there's a few shortcomings to your question:

  1. Your question is about gene names and/or annotation, but you don't show any
  2. administrative: use the code button for pasting a few lines of your dataset
ADD REPLYlink modified 17 months ago • written 17 months ago by Carambakaracho2.2k

Ok sure, thanks for the tips

ADD REPLYlink modified 17 months ago • written 17 months ago by ari_sh700

My apologies, I just realized you do show a few "lines" within that one line, but it's really hard to read...

ADD REPLYlink written 17 months ago by Carambakaracho2.2k

You are totally right! This is my first time writing a post here. Thanks again for telling me how to do that!

ADD REPLYlink written 17 months ago by ari_sh700

Now, how can I understand the gene names?

what exactly do you mean by that? are you looking to find the genes that are located in that region?

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

Yes, exactly I am looking for that...

ADD REPLYlink written 17 months ago by ari_sh700

BEDtools (more specifically bed-intersect ) will be your friend.

With a little reformat of those columns and given you have a gff (or bed) file of the annotation, this should be pretty straightforward

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

Thank you very much for your answer, can you tell me a bit more information about it, please?

ADD REPLYlink written 17 months ago by ari_sh700

sure, can you however first confirm that you have an annotation of the genes in gff or bed format

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

Thank you. Firstly, I would like to apologize for my delay respond because I am a new user and the system did not let me to reply anymore yesterday. I want to do the annotation for the SNPs based on their location as I brought the data in my post

ADD REPLYlink written 17 months ago by ari_sh700

No worries.

see my answer below

ADD REPLYlink written 17 months ago by lieven.sterck8.7k
0
gravatar for lieven.sterck
17 months ago by
lieven.sterck8.7k
VIB, Ghent, Belgium
lieven.sterck8.7k wrote:

OK, first you will need to transform the list you have in bed format, eg by using the following linux cmdline:

sed 's/_/\t/g' <your file> | awk '{ print $3,$4,$6}' | sed '1d' > your_file.bed

then you use bedtools intersect and provide the file created above and the bed (or gff) file of your annotation. Depending on your settings this will give you the list of genes that overlap with your snp interval regions

Word of caution : it's critical you use the same sequence name IDs in both files so perhaps you will need to modify them slightly so they correspond to each other

ADD COMMENTlink written 17 months ago by lieven.sterck8.7k

Thank you very much for your answer. My problem here is the annotation. I don't have any annotation ....

ADD REPLYlink written 17 months ago by ari_sh700

Well... this is when things get more complicated. This is why lieven.sterck asked you specifically for annotation.

Based on the limited information you disclosed, I see the following options:

  • Find annotation in a public database
  • Get annotation from a colleague
  • Annotate yourself
ADD REPLYlink modified 17 months ago • written 17 months ago by Carambakaracho2.2k

Thank you for your reply, Can you please tell me what kind of information are you looking for? If you read my previous comments precisely I emphasised that I am looking for annotation and the gene's name. That's what I am here and I put this post to know that otherwise I'm not looking for wasting my time here....

ADD REPLYlink written 17 months ago by ari_sh700
1

Let's take a step backward: which organism are you working on? We need to know what is available for this organism. If nobody has generated annotation then there is not much we can do. Annotation would tell us which genes are where.

That's what I am here and I put this post to know that otherwise I'm not looking for wasting my time here....

Excuse me, wasting your time? Right now you are using the time of a bunch of volunteers who are trying to help you. If you don't feel like wasting your time, then you don't have to post here and are free to solve your problems on your own. Please be as complete as possible when asking questions.

ADD REPLYlink written 17 months ago by WouterDeCoster44k

Well, I've read you all your posts precisely but this was not very clear at all. Moreover, you have 'annotation' and 'annotation' it can be interpreted in a very broad sense.

So basically your question is how to annotate a genome (in the first place)? If so, we would need much more info on your data at hand to advise you. What data do you have, what kind of genome, ...

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

I acknowledge your politeness (and I believe so does everybody, that replied)

When you don't have gene model annotation and you don't disclose the organism you're working on how do you suggest anyone can help you?

I processed this morning 48 Propionibacterium freudenreichii genomes, annotated yesterday a dozen Acinetobacter baumannii genomes and will continue to work on a data analysis pipline based on the Cricetulus griseus genome this afternoon. It's unlikely you're interested in any of these, and I can not guess your genome

ADD REPLYlink modified 17 months ago • written 17 months ago by Carambakaracho2.2k

exactly!

Perhaps (but not really the advised way ) to reduce your workload a little you could focus your annotation (in case you will have to do it yourself) to the regions you are really interested in.

What kind of setting are we talking about here? small genome <-> large genome?

ADD REPLYlink written 17 months ago by lieven.sterck8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 891 users visited in the last hour