Aligning Reads To A Reference Transcriptome
2
2
Entering edit mode
7.9 years ago
Prakki Rama ★ 2.5k

Hi all,

Could i please know if we can align reads to a Reference Transcriptome instead of Reference genome and assemble a transcriptome using tophat/cufflinks? Any potential advantages/disadvantages by doing so? Any ideas using BWA ( a non-spliced aligner) for this task?

Please spare me, if i could not put properly. Thanks in advance to your suggestions.

transcriptome cufflinks bwa • 15k views
ADD COMMENT
9
Entering edit mode

For Tophat, check manual page: http://tophat.cbcb.umd.edu/manual.shtml

-G/--GTF <gtf gff3="" file="">

Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.

-T/--transcriptome-only

Only align the reads to the transcriptome and report only those mappings as genomic mappings.

So you can choose whether to align to genome only, transcriptome + genome or transcriptome only.

There are some other options in Tophat connected to transcriptome mapping, I recommend to check them too.

ADD REPLY
1
Entering edit mode

you might wanna post this as an answer...

ADD REPLY
0
Entering edit mode

Thank you jockbanan. I would consider it.

ADD REPLY
0
Entering edit mode
6.6 years ago
archie ▴ 130

Hello everyone

Form this post i got to learn important things . As i have few query in my mind, that i want to discuss here .

I am working on RNAseq analysis using tophat and worked on default parameters for mapping provided -G GTF file . As for my species, no reference available (Gossypium hirsutum) , therefore i picked the closely related species i.e Gossypium arboreum. Multimapped reads percentage is bit high and uniquley mapped reads are less. As this cotton is polyploidy species, Therefore i cant discard the multimapped reads.  i end up with poor results , my working command is as follows ,

python tophat.py -p 8   -G  jsn.gff   -o LIB_SG323_FJSN_Trans refernece.fa  1_fastq_1   1_fastq_2

Now I am working on another strategy where i want to map to gene models rather than mapping against whole reference genome. Providing the -T ( transcriptome only) will do mapping against the gene models only or it is other than this ??? For transcriptome mapping, command should be ...

python tophat.py -p 8 -T  -G  jsn.gff   -o LIB_SG323_FJSN_Trans refernece.fa  1_fastq_1   1_fastq_2

Please correct me if i am wrong anywhere

waiting for reply

Thank you in advance

ADD COMMENT
0
Entering edit mode

As this cotton is polyploidy species, Therefore i cant discard the multimapped reads.

Even though its polyploid, only one copy of chromosome will be there in fasta file. Hence, multi mapped reads are not at all related to ploidy of the genome.

and what do you mean by map to gene models ? You want to map to transcriptome of closely related species ?

ADD REPLY
0
Entering edit mode
6.6 years ago
archie ▴ 130
 -T/--transcriptome-only Only align the reads to the transcriptome and report only those mappings as genomic mappings.

Yes you are right , there will be one copy of chromosome in fasta file. But reason behind not filtering out the multimapped reads against genome is numerous repeats ( extremely high) within it.

Under tophat manual it is given that providing GTF file leads for the --transcriptome-index ( here transcriptome means gene provided in GTF file ?? m i right ?? or it is other than this ?? )

Tophat Mapping without -T

python tophat.py -p 8   -G  jsn.gff   -o LIB_SG323_FJSN_Trans refernece.fa  1_fastq_1   1_fastq_2

and with -T ,

python tophat.py -p 8  -T -G  jsn.gff   -o LIB_SG323_FJSN_Trans refernece.fa  1_fastq_1   1_fastq_2

i  am getting difference in FPKM values. Why is it so ??

How running tophat with first command differ from the second one??

ADD COMMENT
0
Entering edit mode

The results might be slightly different but should not be significantly differ.

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6