Correct genome version for ncbi37
0
0
Entering edit mode
8 months ago

I have a paper that aligned data using NCBI build 37 (PMID 24429703), it said:

Paired-end sequencing reads were aligned to the human genome (NCBI build37) using the BWA algorithm

They provided an excel file with some variants of interest.

Now I am trying to fetch the sequence that exactly corresponds to the protein change, say the 10th one:

CHR Hugo_Symbol cDNA Protein_Change

3 RPL22L1 c.362A>G p.E121G

What is the right biomart ensembl build to do this? I am trying:

    genome = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                           host="grch37.ensembl.org",
                           path="/biomart/martservice",
                           dataset="hsapiens_gene_ensembl")
    sequence <- biomaRt::getBM(attributes= c("peptide",  "hgnc_symbol","protein_id"),
                                           filters="hgnc_symbol",
                                           values =  "RPL22L1",
                                           mart=genome)

I am fetching with the hgnc_symbol but I don't think this is the right thing as the gene might have multiple isoforms, which I would usually fetch as specific transcripts. Is this correct (i.e. do I need more information?). Do you have any suggestions?

ensembl biomart • 265 views
ADD COMMENT

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6