How To Get A Unique Line Of Annotation For Each Specific Genomic Position Of A Plant Genome
2
1
Entering edit mode
10.9 years ago
Jianfengmao ▴ 310

I am new to genomics. And, I am working on plant species which are different with human. We usually have fewer available tools to use and fewer example to follow. Please give me some direction.

I have genomic variations (SNP, indel, CNV) coordinated by chromosome:start:end in GFF/BED/VCF format. One genomic variation is defined a specific genomic position (in base pair).

for example:

# SNPs,chr,start,end
SNP_1,1,43,43
SNP_2,2,56,56


I would like to get such genomic variations annotated by various gen/protein/passway centric annotations (as listed in BioMart databases). I tried R/bioconductor biomaRt package. But, I failed to get a unique line of annotation for a specific genomic position. Could you please give any directions on that?

################################################code I used as an
example###########################
library(biomaRt)
listMarts()
plant = useMart("plant_mart_7")
alyr=useDataset("alyrata_eg_gene", mart=plant)
atha = useDataset ("athaliana_eg_gene",mart=plant)

listAttributes(alyr)
listFilters(alyr)

chr<-c(rep(1, 10))
start<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
end<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)

getBM(attributes =
c("chromosome_name","start_position","ensembl_gene_id",
"start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
= TRUE)
###########################################################################################

annotation plant • 2.8k views
1
Entering edit mode

What do you mean by failed to get a unique line of annotation, what is the result of the query? I think your problem is that your example regions dont overlap genes.

0
Entering edit mode

To Michael,

I used 10 genomic positions to query annotations (as shown in my forward example). But it return me more than 10 lines of annotations. I do not which line of genomic position do the results correspond to. Could you give me a direction on that?

0
Entering edit mode

Steve solved this over on the bioconductor mailing list: http://thread.gmane.org/gmane.science.biology.informatics.conductor/33240

0
Entering edit mode

Jian-Feng, I'd recommend picking one location to ask your questions at. It can be frustrating to try and help someone, only to feel like you wasted your time because it's being worked on elsewhere. Pick a mailing list or forum you believe will help you best, and try and become part of the community.

0
Entering edit mode

yes, I will follow your advice, picking one location to ask questions. I am so anxious at times I am facing problems, and usually I can not get any helps around me.

0
Entering edit mode

Brad, It has not been answered, though I expected it. Steve can not understand my question. May he do not know what I need. Sorry about that.

2
Entering edit mode
10.9 years ago

Instead of querying single genomic positions for genes, which possibly don't map to any gene, get a complete list of transcripts/genes first, then try to overlap the hypothetical SNPs with the genes or try to find genes closest to your SNPs.

Edit: Problem possibly: You are starting out with the most complicated example, arabidopsis lyrata, which doesn't have a finished single genome sequence but: "The 206.7Mbp Araly1 assembly contains 695 nuclear scaffolds..."

So it's quite clear that your query might hit multiple scaffolds because a genomic position might be present in several scaffolds (contigs).

Note that your output is in fact unique, the unique filter guarantees that each line contains a unique hit, in the sense that each line is different, but it doesn't have anything to do with a unique genomic position.

For now, look for the IRanges GenomicRanges packages in R to compute the overlaps. I currently don't have any scripts to share for that.

0
Entering edit mode

I came from biology, and am not good at computer and programming, though I am analyzing genomic data. Could you please give any example (lines of commands/scripts) of your methods? Or, any software can I turn to?

0
Entering edit mode
10.9 years ago
Jianfengmao ▴ 310

To Michael,

I used 10 genomic positions to query annotations (as shown in my forward example). But it return me more than 10 lines of annotations. I do not which line of genomic position do the results correspond to. Could you give me a direction on that?

following is a example, also:

#### code

getBM(attributes =
c("chromosome_name","start_position","ensembl_gene_id",
"start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
= TRUE)


#### result

chromosome_name start_position                ensembl_gene_id
1                1          48875            Al_scaffold_0001_16
2                1          72255            Al_scaffold_0001_21
3                1          10652             Al_scaffold_0001_4
4                1          82573 fgenesh1_pg.C_scaffold_1000018
5                1          87206 fgenesh1_pg.C_scaffold_1000020
6                1          29681 fgenesh1_pm.C_scaffold_1000009
7                1          51526 fgenesh1_pm.C_scaffold_1000016
8                1          78367 fgenesh1_pm.C_scaffold_1000020
9                1          35461 fgenesh2_kg.1__12__AT1G02120.1
10               1          39949 fgenesh2_kg.1__13__AT1G02110.1
11               1          46396 fgenesh2_kg.1__19__AT1G02090.1
12               1          55814 fgenesh2_kg.1__20__AT1G02070.1
13               1          74785 fgenesh2_kg.1__23__AT1G02065.1
14               1          80941 fgenesh2_kg.1__25__AT1G02050.1
15               1          80941 fgenesh2_kg.1__25__AT1G02050.1
16               1          90714 fgenesh2_kg.1__28__AT1G02010.1
17               1          90714 fgenesh2_kg.1__28__AT1G02010.1
18               1           3311  fgenesh2_kg.1__2__AT1G02190.2
19               1           3311  fgenesh2_kg.1__2__AT1G02190.2
20               1           9512  fgenesh2_kg.1__3__AT1G02180.1
21               1          12552  fgenesh2_kg.1__5__AT1G02160.2
22               1             47              scaffold_100001.1
23               1             47              scaffold_100001.1
24               1           7429              scaffold_100003.1
25               1          13702              scaffold_100007.1
26               1          15665              scaffold_100008.1
27               1          19692              scaffold_100009.1
28               1          24515              scaffold_100010.1
29               1          33055              scaffold_100013.1
30               1          33055              scaffold_100013.1
31               1          33055              scaffold_100013.1
32               1          33055              scaffold_100013.1
33               1          33055              scaffold_100013.1
34               1          33055              scaffold_100013.1
35               1          43130              scaffold_100016.1
36               1          49553              scaffold_100018.1
37               1          49553              scaffold_100018.1
38               1          57579              scaffold_100022.1
39               1          58865              scaffold_100023.1
1
2
3                                 IEA
4
5
6
7
8
9
10
11
12
13
14                                IEA
15                                IEA
16                                IEA
17                                IEA
18                                IEA
19                                IEA
20
21
22                                IEA
23                                IEA
24
25
26                                IEA
27
28
29                                IEA
30                                IEA
31                                IEA
32                                IEA
33                                IEA
34                                IEA
35
36                                IEA
37                                IEA
38
39