Question: Suggestion for correctly annotating predicted genes and discovering isoform of the genes if any with RNA-seq
gravatar for mjoyraj
5.5 years ago by
mjoyraj60 wrote:

Few genes are poorly annotated in the database and there status are marked as predicted. I want to check the actual annotation of the genes. Also I want to look at, if any of the genes have isoforms.    

To address the above points, I have developed RNA-seq data’s from the developing tissues where the genes are expressed. Let us consider the RNA seq data as

                        1_R1, 1_R2

                        2_R1, 2_R2    

                        3_R1, 3_R2

                        4_R1, 4_R2

                        5_R1, 5_R2

I have downloaded the genome file and the gft file which are say



I have manually incorporated the predicted annotation of the genes of interest in the gtf file in proper format.

Next, I want to do mapping and assembly with Tophat and Cufflinks to address the above issue.

My Tophat command will be:

Tophat –p 40 –G genes.gtf  -o <tophat_output_file> <indexed-genome.fa> 1_R1 1_R2

(Same for the other four)

My cufflinks command will be

cufflinks -o <cufflink_output_file> -p 12 -g <genes.gtf> -b <genome_file> --max-bundle-frags 1000000000000 --multi-read-correct <tophat_output_.file-accepted.bam>

I will use IGV browser to visualize the mapped reads (tophat_output_.file-accepted.bam). I will use it again to visualize and compare the original gtf file and the cufflinks gtf file. I will use the tophat ‘junctions.bed’ file to visualize the exon-exon junctions. I hope the comparison of original gtf and cufflinks gtf will help me to correctly annotate the genes, and ‘junctions’ will give clue of isoform of the genes if any. The expressions of the genes will be confirmed by the cufflinks ‘genes.fpkm’ file. The expression of the isoforms if any will be confirmed by the ‘isoform.fpkm’ file.

Whether, the tophat and cufflinks commands are okay to correctly annotate the genes and find isoform of the genes..?? Any suggestion will be highly appreciated..

ADD COMMENTlink modified 5.5 years ago by Manvendra Singh2.1k • written 5.5 years ago by mjoyraj60
gravatar for Manvendra Singh
5.5 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

Tophat –p 40 –G genes.gtf  -o <tophat_output_file> <indexed-genome.fa> 1_R1 1_R2

In here you give tophat output folder not file, tophat would make files

you do not give genome.fa but just the prefix of index, it would search .fa by own.

after cufflinks output, run cuffcompare to all the transcripts.gtf .

in output with class code "j" you would have isoforms, just do little filtering on FPKMs and length of detected isoforms .

This is my suggestion.

ADD COMMENTlink written 5.5 years ago by Manvendra Singh2.1k

Thanks. Do you have any idea how to filter the false junctions from the junction.bed file produced by Tophat..??

ADD REPLYlink written 5.5 years ago by mjoyraj60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1907 users visited in the last hour