I have RNA-seq data and aligned them. I am looking for a way to get only 5'UTR of each gene and look for a motif. do you guys know how to get the 5'UTR?
Provided your genome is available - Ensembl (BioMart) or UCSC (Table Browser). Tools to use are in respective brackets.
retrieval of upstream non-coding sequences
Which organism are you working with ?
A simple solution would be to download homer and then install the genome of your interest, which comes with the annotations like exons, introns, 3'UTR, 5'UTR etc. With a simple grep, you could extract all the 5'UTRs and you could use homer for motif analysis.
As @genomax2 suggested, you can easily get the required sequences from Ensembl. Follow the turorial here.
A gtf file for your genome should contain coordinates for 5' UTR sequences. Ensembl has a bunch.
This has been answered here:
How Can We Find The Info For 3'Utr And 5'Utr In Gencode Gtf File?
Genomic Positions Of 3'Utrs Of Refseq-Genes
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy