Question

Correct genome version for ncbi37

0

Entering edit mode

8 months ago

ramiro.barrantes • 0

I have a paper that aligned data using NCBI build 37 (PMID 24429703), it said:

Paired-end sequencing reads were aligned to the human genome (NCBI build37) using the BWA algorithm

They provided an excel file with some variants of interest.

Now I am trying to fetch the sequence that exactly corresponds to the protein change, say the 10th one:

CHR Hugo_Symbol cDNA Protein_Change

3 RPL22L1 c.362A>G p.E121G

What is the right biomart ensembl build to do this? I am trying:

    genome = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                           host="grch37.ensembl.org",
                           path="/biomart/martservice",
                           dataset="hsapiens_gene_ensembl")
    sequence <- biomaRt::getBM(attributes= c("peptide",  "hgnc_symbol","protein_id"),
                                           filters="hgnc_symbol",
                                           values =  "RPL22L1",
                                           mart=genome)

I am fetching with the hgnc_symbol but I don't think this is the right thing as the gene might have multiple isoforms, which I would usually fetch as specific transcripts. Is this correct (i.e. do I need more information?). Do you have any suggestions?

ensembl biomart • 265 views

ADD COMMENT • link 8 months ago by ramiro.barrantes • 0