Question: How can I know what genes each Trinity ID are ?
2
gravatar for tiago211287
3.8 years ago by
tiago2112871.1k
USA
tiago2112871.1k wrote:

I performed denovo assembly with Trinity using reads from heart mouse RNaseq. Than I mapped the transcriptome back to the reference genome with Blat . I also used Kallisto to Count the transcript abbundance in each sample. But now I want to know what Trinity ID's are known already in the annotation and what their names, and what is not annotated. How can I do that?

ADD COMMENTlink modified 3.8 years ago by cyril-cros890 • written 3.8 years ago by tiago2112871.1k

Though I never did this, I guess bedtools / bedops can provide overlaps between the transcriptome mapping and the mouse annotation. 

ADD REPLYlink written 3.8 years ago by h.mon27k
2
gravatar for cyril-cros
3.8 years ago by
cyril-cros890
France
cyril-cros890 wrote:

Now, for a correct answer.

You are doing mice which is a really well annotated organism. Trinity will be imprecise. You would be way better using Cufflinks with the reference genome and annotation, if you are looking for novel  isoforms or things like that it will find them for you. Are trying to achieve something in particular?

Cufflinks will also give you the correspondence between its gene names and the official ones.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by cyril-cros890

I already made a pipeline using a reference, with STAR aligner.

To find new stuff and study alternative splicing Iam performing de novo assembly because it is independent of the reference. I found a tool inside bedops that can convert PSL files to BED (psl2bed), I thought I could follow your previous ideas with this,.

ADD REPLYlink written 3.8 years ago by tiago2112871.1k
1

Just be careful, Blat shows you similar segments. Orthologous genes may be a problem here...

ADD REPLYlink written 3.8 years ago by cyril-cros890

I also used a tool called pslReps that filter blat output to only the best hit of each query.
link: https://github.com/ENCODE-DCC/kentUtils/tree/master/src/hg/pslReps

ADD REPLYlink written 3.8 years ago by tiago2112871.1k
0
gravatar for cyril-cros
3.8 years ago by
cyril-cros890
France
cyril-cros890 wrote:

Disclaimer: I forgot Trinity outputs a fasta file and not a GTF or BED. Bad answer, but might be useful to someone

I had that same issue with another tool (https://github.com/shenkers/isoscm ). Your best bet is to use bedtools/bedops. My scripts are not really portable (working on it, who knows it could be a small methods article) but:

  • I use bedtools merge to merge transcripts of the same de novo gene into one single maximum length transcript with no introns (min start position max end position)
  • I do the same with the official annotation
  • I use bedtools intersect to get a hopefully one to one correspondence

Caveats:

  • you need to use the -s (strand specific) flag.
  • I check if I have a true one-to-one correspondence: are there unassigned transcripts, and more importantly do I have several genes overlapping the same transcript? If you are unlucky and have very similar sequences close by, you may get fused transcripts where your alignment software misplaces one half of a pair of reads. The assembly software then outputs a single really long gene with lots of introns, instead of separate genes. The alignment software should have an option for maximum intron size you can fiddle (conversely, if it is too short, you split a gene with a large intron into two genes). 
  • you have different transcripts for each gene due to alternative splicing, polyadenylation, TSS. Merging transcripts resolves this issue for me.

I would like to first take a look at what Cufflinks does since it is pretty good for de novo assembly with a reference. Its 3' UTR are often screwy though. In all cases I use IGV often to look at my reads, and I have a good depth to start with after pooling several biological replicates.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by cyril-cros890

Questions:
My data is non strand specific. Still I must use the -s flag? Will this be a problem?

After the blat alignment I got psl files, there is some tool to convert to bed?

ADD REPLYlink written 3.8 years ago by tiago2112871.1k

Same with mine. Transcripts generated by Trinity are strand-specific. However, you are including reads that may be the product of antisense transcription.

EDIT: I made a mistake, trinity outputs a fasta file and not a gtf or bed file...

ADD REPLYlink written 3.8 years ago by cyril-cros890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour