I have a paper that aligned data using NCBI build 37 (PMID 24429703), it said:
Paired-end sequencing reads were aligned to the human genome (NCBI build37) using the BWA algorithm
They provided an excel file with some variants of interest.
Now I am trying to fetch the sequence that exactly corresponds to the protein change, say the 10th one:
CHR Hugo_Symbol cDNA Protein_Change
3 RPL22L1 c.362A>G p.E121G
What is the right biomart ensembl build to do this? I am trying:
genome = useMart(biomart="ENSEMBL_MART_ENSEMBL",
host="grch37.ensembl.org",
path="/biomart/martservice",
dataset="hsapiens_gene_ensembl")
sequence <- biomaRt::getBM(attributes= c("peptide", "hgnc_symbol","protein_id"),
filters="hgnc_symbol",
values = "RPL22L1",
mart=genome)
I am fetching with the hgnc_symbol but I don't think this is the right thing as the gene might have multiple isoforms, which I would usually fetch as specific transcripts. Is this correct (i.e. do I need more information?). Do you have any suggestions?