collecting ultraconserved elements from RNA-seq data
2
1
Entering edit mode
7.7 years ago
Farbod ★ 3.4k

Dear Friends, Hi. ( I'm not native in English so, be ready for some possible language flaws).

As you know, there are about 481 ultra-conserved elements that are very similar among Human, rat, mouse and some fishes (http://www.ncbi.nlm.nih.gov/pubmed/15131266 ).

I have the RNA-seq data of a non-model vertebrate (so, there is no reference genome for it) and de novo transcriptome assembly of it and I want to check

1- first, if these ultra conserved 200 bp elements is exist in this species, too.

2- and secondly, what is the percentage of the similarity of the related sequence of my species to the sequence of each element in human

Can the RNA-seq data (and de novo assembly) could be used for these purposes ? How?

Thank you in advance

RNA-Seq sequence genome • 2.1k views
ADD COMMENT
1
Entering edit mode

You would not be able to get the ones that are located in the intron (and a cursory glance at the paper seems to have some). You can align your data to genomes of those species but be ready for a lot of leg work.

It depends on what you want to achieve (you will find something since the original paper does say that they found conservation to some extent with fish) and if you have the time to invest in this.

ADD REPLY
1
Entering edit mode

Hi Dear genomax2,

I want to check their existence in my species and degree of conservation of them.

ADD REPLY
1
Entering edit mode

I am not being rude but a valid question is then what will you do next?

We know that based on the paper there are going to be some but their significance would be harder to tackle in your genome since all you have to go on is the transcriptome.

When you extract the sequences from that paper remember that the coordinates are going to be from 2004. The ECR browser may be a better option if you can get at the data directly without having to deal with that browser UI.

If you click on the ECR link at the top left you can get a new window where you will get the sequences of the ECR's. Base Genome allows you to select the genome you are looking at.

ADD REPLY
1
Entering edit mode

Dear genomax2,

maybe I did not get the point correctly but first I want to check the existence of these genes in my species as it is very

older in the evolutionary perspective than other fishes and vertebrates that have been used in this paper (and other

similar papers)

then if I find some exact matches, as I am using the transcript data, they are the ultra-conserved genes (I will miss the

introns and may be they are not so important for me).

if i find some similar but with some SNP or mismatches then may be I can invest about the cause of such mismatches,.

is it what you have asked in "a valid question is then what will you do? " ? correct my if my assumption is not correct, please

ADD REPLY
1
Entering edit mode

All that is great. Sounds like you have funding to do some basic research without having to justify the end first :-)

This page lists the conserved elements from the paper you originally linked above. They are referring to hg16 genome build but you should be able to get the sequence and lift it Over to current assembly, if needed.

ADD REPLY
1
Entering edit mode

Dear genomax2,

thank you for your valuable times and specially for your ultra-helpful link !

ADD REPLY
1
Entering edit mode

Dear genomax2, Hi

I have used the sequences you have provided, from here and blastn them against my transcriptome assembly,

it has about 81 hits (from 481 ultra-conserved-elements), and interestingly it is about 19 "non-exonic" elements among them!

Do you have any explanation in this regard ?

Thanks

ADD REPLY
1
Entering edit mode

How was your RNA-seq performed? Ribodepletion or poly-A selection? If the former, it might be that some nascent RNA (unprocessed/unspliced) is present, containing introns.

Alternatively, it's also possible that elements which were annotated as "non-exonic" actually are coding but not properly annotated as such. You should have a look what's there, perhaps long non-coding RNAs.

ADD REPLY
1
Entering edit mode

Dear WouterDeCoster, Hi (nice picture you have for your profile!)

It was from Illumina hiseq200o for about 3 years ago and I think it was "poly-A selection".

can you suggest any web-sites that I can check my transcriptome selected sequences in this regard ?

it is one of my sequences that shows hit with non-exonic UCEs.

>TRINITY_DN76988
GAAAAGTCCAGTCCTCCTAGCTTCAGAAAATCTATTTTTCCCATTTTAATACCCCGCGTA
ACAGTCTTCATAATTCATTCGAGTGTGTTAAGCGTAGTTTTATTAGATCTGAAACAAATT
TTGGTGGGAGATCCTATAGGTCATTAACCATGGAGTAATTTTATCCTTGTTTCCCTAATG
ATGCCATAATGGCGAGTGAATTTCTTAACTAAAGACCAAAGAACATTTTGAAGGTCAGCT
TCATCTGCAAGCTCCTTCAAGCGCTTCTCAGAGAGATTGGAAAAGTCGGAGATTTTTGAA
GAGTCATTAATAACGTTAAAGCTGAAAGCCTATTTGCGTTCTCGCTTTCTACCTTTTAAT
TTCATACTCTTTTTTTCACTTTCTCTCTCCTTCCTTCTCCTCTGTTCTGCAGTTGCCCTC
ATGCAGAAAGAATGGAGTGCCGAGCGGGAGGGCAAAAATGGCAGCGTAGTGACATACAGA
TCCCAGATGTGATGCTGCAATAATTTAAATTTTATGCCTTTGTTATCACTTTAATCATTT
TCTTTATTCGTTTTGTTTCAGCGATCAGAGAGAGACACCTGATAGGGCGAAATACCAGGG
GAACAATTTTTATTTGGAATGTGGAATCTACTTCCCCATTGGCTTGTCTCTCGCTGTAAT
TGAAAAAATAAGATAGA
ADD REPLY
1
Entering edit mode

I didn't really need feedback on my profile pic, but thanks, I guess.

I checked the fragment you provided in the mouse and human genome and there is no trace of anything coding. But there are indeed blocks of conserved sequences. Perhaps you are onto something new ;)

Was your RNA treated with DNase or is genomic contamination possible? It's important to know the background of the library prep if you are working on the data, years later.

ADD REPLY
1
Entering edit mode

" Perhaps you are onto something new ;) " was very valuable for me !

yes we have used DNase treatment, and "Truseq kit" was used for cDNA library construction.

and this is the e-value of blast hit : 1.14e-136

ADD REPLY
1
Entering edit mode

What was non-exonic in hg16 build could have changed since (did you check in current assembly if those sequences are still non-exonic)?

ADD REPLY
1
Entering edit mode

Hi, no!

As I just have the sequence of the UCE from the link you have provided (original seq) and the sequence of my fish transcriptome that shows blastn hits with the UCE sequences.

and, I do not know how to check that the old UCS seq is changed to exonic in new Human genome.

this is the original UCE seq for a non-exonic example I have provided previously:

> uc.294+
CGAGATGAAATTGAGACATGGAAGAATTTATTGCCCAGAAAATTCCATTCTGCTATCTGATTCAAAAAGTCCAGTCCCCCCAGCTTCGGAAAATCTATTTTCCACATTTTAATACCCTGCAGAACAGTCCTCATAACTCATCCGAGTGTGTTAAGCACAGTTTTATTAGATCTGAAACAAATTTTGGTGGGGAGATACTATAGGTCATTAACCATGGAGTAATTTTATCCTTGTTTCCCTAATGATGCCATAATGGCGAGCGAATTTCTTAACTAAAGACCAAAGAACATTTTGAAGGTCAGTTTCATCTGTGAGCTCCTTCAAGCGCTTCTCAGAGAAGATTGGAAAACTCGCCGATTTTTTGAAGAGTCATTAATAATGTGAAAGCTGAAAGCACCCTCCATTTGCGTTCCTGCTTTTTACCTTTTAATTTTATATCGTCCC

If you check it, please kindly teach me, too. Thanks

ADD REPLY
1
Entering edit mode

This UC still appears to be non-exonic (intergenic) and highly conserved in many things (including zebrafish). I am not sure if you can see this link. It may last for a few days.

Reason I brought that up was the sequence you posted above had a trininty ID and I thought that you had pulled out a sequence from your transcriptome using a non-exonic human sequence.

Your data could have some trace contamination of DNA. If you ever aligned your own data to zebrafish you may be able to see the reads that hit this UCE.

ADD REPLY
1
Entering edit mode

Hi genomax2,

I have posted both my "Trinity transcriptome sequence" and "UCE original sequence" of non-exonic

element for you.

I have not align my data to zebrafish genome yet, but I think it is possible to just align this

"TRINITY_DN76988" sequence to zebrafish genome and check that what is what ?

am I right ?

ADD REPLY
1
Entering edit mode

Here is the alignment of the TRINITY piece to Zebrafish genome. It is in intergenic/non-exonic region. Zoom out to get a broader view.

This link has both the UCE and trinity piece. The hits overlap.

Edit: The links have expired.

ADD REPLY
1
Entering edit mode

Based on that it's intronic, not impossible that's an alternative exon. Only lab work can tell us what is really going on.

ADD REPLY
1
Entering edit mode

I really appreciate all the times and efforts you have spent for me :)

So, there is a sequence that is non-exonic (it is intronic) in human and zebrafish BUT it is

present in my RNA-seq assembly (so it is an expressed mRNA = transcript),

What hypothesis can we offer in this regard (without lab work, of course) ?

(My species is evolutionary very older than zebrafish)

ADD REPLY
1
Entering edit mode

Could be genmoic contamination, could be a gene that was lost in evolution, could be an alternative exon very rarely present or only in a specific tissue type.

ADD REPLY
1
Entering edit mode

What do you mean by genomic contamination?

1- the contamination of the DNA of the fish individuals, itself ?

2- or, genomic contamination of the human that prepare the samples and libraries ?

ADD REPLY
0
Entering edit mode

There may still be a bit of DNA contamination (hopefully from your fish and not humans) left in your RNA prep that went into the library.

This is where you can go back to your alignments of original reads to the transcriptome you built and check how many reads support/align to this TRINITY transcript. If there are a lot then ...

ADD REPLY
1
Entering edit mode

They are also present in Fugu and Minke Whale (to some extent). Unless the UCE has a known function this is an observation (without any specific hypothesis).

ADD REPLY
1
Entering edit mode

Yes! because I guess that these Ultra-conserved-elements are conserved in all

vertebrates (and maybe invertebrates) so they are also present in Fugu and Minke Whale.

And, this situation that this non-exonic element is present in the transcriptomic data has not anything important in it ?

ADD REPLY
1
Entering edit mode
7.7 years ago

Your url contains the bracket and gives a problem :p (but that should be easy to solve) Based on the abstract:

These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development.

Your RNA-seq data will not contain data about introns and not about elements 'nearby' genes. Obviously only coding elements can be found, furthermore, these need to be expressed in the tissue you sequenced.

With regard to the 'how', the easiest would probably be to get these conserved sequences and map them to your assembly (or map your reads to the elements).

ADD COMMENT
1
Entering edit mode

Dear WouterDeCoster , Hi and Thanks

What is your idea about collecting the sequence of these elements from "NCBI Nucleotide" section and create a file

containing those sequence and then make my de novo assembly a blastable database and then perform a "blastn" of the

collection file against my transcriptomes? is this strategy o.K ? or for example other database than NCBI Nucleotide is preferred ?

2- If I find some blast hit and so similarity, what is the best way to check the percentage of similarity of the two sequence ? (I usually use NCBI "Align two sequences" section).

ADD REPLY
1
Entering edit mode

Your strategy sounds okay to me, you can always give it a try and check if the results are reasonable. But given the limitations of only having RNA-seq, your analysis is already incomplete from the beginning. With regars to your second question, you need to have a look how similar 'ultraconserved' should be, perhaps how it is defined in the paper or what you find reasonable. You can probably calculate a distance between your sequence and the consensus element, e.g. hamming distance, not sure what's appropriate.

ADD REPLY
1
Entering edit mode
7.7 years ago
BioinfGuru ★ 1.7k

Have you tried the genome alignment tool in the ECR browser? https://ecrbrowser.dcode.org/

Instructions here: https://ecrbrowser.dcode.org/ecrInstructions/ecrInstructions.html (CTRL F: genome allignment)

ADD COMMENT
1
Entering edit mode

Hi, There is no genome for my species, yet.

ADD REPLY
1
Entering edit mode

Not sure who designed the UI for this browser but it is confusing. Go here and then click on Genome Alignment (at top right). You are able to paste your own sequence in to search against the genome selected.

This would of course not be the way to do it with a few hundred candidates!

ADD REPLY
1
Entering edit mode

I think first I must collect some of my transcripts that have hits using blastn, right ?

ADD REPLY
1
Entering edit mode

Or get the ECR's from the browser (I don't see any way to bulk download them). Instructions (are in my post above) or in the help page here. Find "Looking closer at ECR's".

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6