Find 3'UTRs for species
0
1
Entering edit mode
3.1 years ago
Palgrave ▴ 90

I have a fasta assembly for a fish species that is not very well characterized, regarding UTRs. Using this fasta I would like to find putative 3'UTR sequences by aligning to a closely related fish species, zebrafish, using the UTRs of zebrafish.

How would you approach this to get a set of 3'UTRs that are conserved in zebrafish?

Assembly alignment Forum • 936 views
2
Entering edit mode

Go to the UCSC genome browser here

Choose track = "Ensembl genes"

Region = "genome"

Output Format = "bed"

Then select output. Here you can select 3' UTRs.

As previously suggested here, you can use bedtools to convert bed format to fasta format using 'bedtools getfasta.

0
Entering edit mode

Hi, I am not analyzing human sample, but a rare fish species

0
Entering edit mode

Well yes, the first link provided will give you 3' UTRs in zebrafish. My understanding of the question is: you would like to align your poorly annotated species to the 3' UTRs of zebrafish, to identity putative 3' UTRs?

0
Entering edit mode

Sorry, I did not see that. So I can also the the 3utr sequence by choosing output format=sequence?

0
Entering edit mode

You have to select Output format = "BED". Selecting other formats will not let you select 3' UTR.

Once you download the BED file, go ahead and download the genome fasta file for zebrafish. bedtools will work by pulling out sequences in the zebrafish fasta file that correspond to the coordinates in your BED file. Here is a link to run it bedtools getfasta. I'd appreciate it if you accepted the first comment as an answer cough cough @ATPoint (who moved my original answer to a comment?)

0
Entering edit mode

Are you sure?

 Ensembl Genes Genomic Sequence

Sequence Retrieval Region Options:
Promoter/Upstream by bases
5' UTR Exons
CDS Exons
3' UTR Exons
Introns
Downstream by bases
One FASTA record per gene.
One FASTA record per region (exon, intron, etc.) with extra bases upstream (5') and extra downstream (3')
Split UTR and CDS parts of an exon into separate FASTA records
Note: if a feature is close to the beginning or end of a chromosome and upstream/downstream bases are added, they may be truncated in order to avoid extending past the edge of the chromosome.

Sequence Formatting Options:
Exons in upper case, everything else in lower case.
CDS in upper case, UTR in lower case.
All upper case.
All lower case.
Mask repeats: to lower case to N

0
Entering edit mode

Hey presto! Looks correct to me.