Question: Finding RNA sequences for a list of genes
0
gravatar for chahat_u
18 months ago by
chahat_u110
United States
chahat_u110 wrote:

I have the data from an RNA-seq experiment, and as a result I have a list of genes with their corresponding number of genes. I have been able to do some filtering and as a result, I have finally a list of 200 genes. For this list, I want to run MEME so as to see if there is anything common motif in their RNA sequences, say in their 5'UTR region.

My question is - Is there a way I can automatically get the whole RNA sequences for all these genes (including the 5' and 3' UTR regions)? I searched on Google, and found this (https://www.ncbi.nlm.nih.gov/guide/howto/find-transcript-gene/) but it will give me results one gene at a time.

EDIT - A friend of mine told me that since I have the RNA-seq data, I can also use samtools view to find the RNA sequences, by using the bed files. I am not sure I understand this. If anyone has any ideas related to this method, it would be awesome.

Thanks!

rna-seq rna genes list • 642 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by chahat_u110
3
gravatar for Emily_Ensembl
18 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

You can do this with BioMart. Help video to get you started. Filter by your list of gene names, get the cDNA (or even 5' UTR) sequence as attributes.

ADD COMMENTlink written 18 months ago by Emily_Ensembl18k
1
gravatar for kristoffer.vittingseerup
18 months ago by
European Union
kristoffer.vittingseerup1.7k wrote:

It is a two step process: 1) Extract genomic cooridnats 2) Extract corresponding sequence

R solution If you are an R user you can extract the genomic coordinats in on of the bioconductor TxDB annotation packages. The TxDB can be used in conjunction with the getSeq() function to extract the sequences of specific parts of your selected genes.

Non-R solution The easiest is probably to download the GTF file for your genes of interest (remember to use the annotation database from were you also got the genes). From here you can use Cufflinks' function gffread to get the sequences.

I would recommend the R version since it directly supports extraction of e.g. UTR regions which can be a bit more trickey from GTF files.

Good luck

ADD COMMENTlink written 18 months ago by kristoffer.vittingseerup1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1783 users visited in the last hour