Question: Find 3'UTRs for species
0
gravatar for Palgrave
29 days ago by
Palgrave20
Singapore
Palgrave20 wrote:

I have a fasta assembly for a fish species that is not very well characterized, regarding UTRs. Using this fasta I would like to find putative 3'UTR sequences by aligning to a closely related fish species, zebrafish, using the UTRs of zebrafish.

How would you approach this to get a set of 3'UTRs that are conserved in zebrafish?

alignment forum assembly • 199 views
ADD COMMENTlink modified 29 days ago • written 29 days ago by Palgrave20
2

Go to the UCSC genome browser here

Choose track = "Ensembl genes"

Region = "genome"

Output Format = "bed"

Then select output. Here you can select 3' UTRs.

As previously suggested here, you can use bedtools to convert bed format to fasta format using 'bedtools getfasta.

ADD REPLYlink written 29 days ago by b.d237170

Hi, I am not analyzing human sample, but a rare fish species

ADD REPLYlink written 29 days ago by Palgrave20

Well yes, the first link provided will give you 3' UTRs in zebrafish. My understanding of the question is: you would like to align your poorly annotated species to the 3' UTRs of zebrafish, to identity putative 3' UTRs?

ADD REPLYlink written 29 days ago by b.d237170

Sorry, I did not see that. So I can also the the 3utr sequence by choosing output format=sequence?

ADD REPLYlink written 29 days ago by Palgrave20

You have to select Output format = "BED". Selecting other formats will not let you select 3' UTR.

Once you download the BED file, go ahead and download the genome fasta file for zebrafish. bedtools will work by pulling out sequences in the zebrafish fasta file that correspond to the coordinates in your BED file. Here is a link to run it bedtools getfasta. I'd appreciate it if you accepted the first comment as an answer cough cough @ATPoint (who moved my original answer to a comment?)

ADD REPLYlink modified 29 days ago • written 29 days ago by b.d237170

Are you sure?

 Ensembl Genes Genomic Sequence

Sequence Retrieval Region Options:
Promoter/Upstream by bases
5' UTR Exons
CDS Exons
3' UTR Exons
Introns
Downstream by bases
One FASTA record per gene.
One FASTA record per region (exon, intron, etc.) with extra bases upstream (5') and extra downstream (3')
    Split UTR and CDS parts of an exon into separate FASTA records
Note: if a feature is close to the beginning or end of a chromosome and upstream/downstream bases are added, they may be truncated in order to avoid extending past the edge of the chromosome.

Sequence Formatting Options:
Exons in upper case, everything else in lower case.
CDS in upper case, UTR in lower case.
All upper case.
All lower case.
Mask repeats: to lower case to N
ADD REPLYlink modified 29 days ago • written 29 days ago by Palgrave20

Hey presto! Looks correct to me.

ADD REPLYlink written 29 days ago by b.d237170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1585 users visited in the last hour