Question: Finding RNA sequences for a list of genes
0
gravatar for c_u
2.1 years ago by
c_u140
United States
c_u140 wrote:

I have the data from an RNA-seq experiment, and as a result I have a list of genes with their corresponding number of genes. I have been able to do some filtering and as a result, I have finally a list of 200 genes. For this list, I want to run MEME so as to see if there is anything common motif in their RNA sequences, say in their 5'UTR region.

My question is - Is there a way I can automatically get the whole RNA sequences for all these genes (including the 5' and 3' UTR regions)? I searched on Google, and found this (https://www.ncbi.nlm.nih.gov/guide/howto/find-transcript-gene/) but it will give me results one gene at a time.

EDIT - A friend of mine told me that since I have the RNA-seq data, I can also use samtools view to find the RNA sequences, by using the bed files. I am not sure I understand this. If anyone has any ideas related to this method, it would be awesome.

Thanks!

rna-seq rna genes list • 781 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by c_u140
3
gravatar for Emily_Ensembl
2.1 years ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

You can do this with BioMart. Help video to get you started. Filter by your list of gene names, get the cDNA (or even 5' UTR) sequence as attributes.

ADD COMMENTlink written 2.1 years ago by Emily_Ensembl19k
1
gravatar for kristoffer.vittingseerup
2.1 years ago by
European Union
kristoffer.vittingseerup2.6k wrote:

It is a two step process: 1) Extract genomic cooridnats 2) Extract corresponding sequence

R solution If you are an R user you can extract the genomic coordinats in on of the bioconductor TxDB annotation packages. The TxDB can be used in conjunction with the getSeq() function to extract the sequences of specific parts of your selected genes.

Non-R solution The easiest is probably to download the GTF file for your genes of interest (remember to use the annotation database from were you also got the genes). From here you can use Cufflinks' function gffread to get the sequences.

I would recommend the R version since it directly supports extraction of e.g. UTR regions which can be a bit more trickey from GTF files.

Good luck

ADD COMMENTlink written 2.1 years ago by kristoffer.vittingseerup2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1983 users visited in the last hour