Retrieving gene details
2
0
Entering edit mode
7.1 years ago
Mk ▴ 10

Hi, I have a list of genes and want to create a .tsv file containing gene name, respective chromosome, and positions of the left and right ends of the gene. I saw this post [link]on researchgate for the same task but I do not find any options to select datasets as mentioned in there, can anyone suggest any other alternative ways of achieving this goal.

The list looks like

TP53

BRCA1

The expected output is supposed to look like;

Gene chr start end

BRCA1 chr17 43044295 43170245

TP53 chr17 7661779 7687550

genes biomart • 1.6k views
ADD COMMENT
3
Entering edit mode

any options to select datasets as mentioned in there

What do you mean by that? Are you not finding the genome you are interested in? You may want to watch this tutorial on BioMart to see if that answers your question.

ADD REPLY
0
Entering edit mode

Thank you genomax2, I was able to follow the tutorial you provided and find the required solutions.

ADD REPLY
2
Entering edit mode
7.1 years ago

You could use UCSC Goldenpath; e.g. for hg19:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/knownGene.txt.gz | gunzip -c > knownGene.txt

You could use the kgXref table to map naming schemes via grep etc.:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/kgXref.txt.gz | gunzip -c > kgXref.txt

Or you could do the mapping via the web interface in the UCSC Genome Browser.

ADD COMMENT
1
Entering edit mode
7.1 years ago
zjhzwang ▴ 180

You can use biomaRt to get information easily.

mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)

Use getBM() to filter characteristic you want.

filter_data <- getBM(
  attributes = c("external_gene_name", "chromosome_name", "transcript_start", "transcript_end"),
  mart = mart
)

And the result you get:

> head(filter_data)
  external_gene_name  chromosome_name transcript_start transcript_end
1          RNU6-280P CHR_HG2128_PATCH         67546651       67546754
2              Y_RNA CHR_HG2128_PATCH         67631019       67631127
3       RP11-222G7.2 CHR_HG2191_PATCH         74823667       74824187
4    Clostridiales-1 CHR_HG2022_PATCH         91356877       91357036
5      RP11-654C22.2  CHR_HG126_PATCH         72505075       72550889
6      RP11-315F22.1 CHR_HG2233_PATCH        239734956      239735538
ADD COMMENT
0
Entering edit mode

I do not know R programming, but thank you zjhzwang for the suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 2414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6