Few genes are poorly annotated in the database and there status are marked as predicted. I want to check the actual annotation of the genes. Also I want to look at, if any of the genes have isoforms.
To address the above points, I have developed RNA-seq data’s from the developing tissues where the genes are expressed. Let us consider the RNA seq data as
I have downloaded the genome file and the gft file which are say
I have manually incorporated the predicted annotation of the genes of interest in the gtf file in proper format.
Next, I want to do mapping and assembly with Tophat and Cufflinks to address the above issue.
My Tophat command will be:
Tophat –p 40 –G genes.gtf -o <tophat_output_file> <indexed-genome.fa> 1_R1 1_R2
(Same for the other four)
My cufflinks command will be
cufflinks -o <cufflink_output_file> -p 12 -g <genes.gtf> -b <genome_file> --max-bundle-frags 1000000000000 --multi-read-correct <tophat_output_.file-accepted.bam>
I will use IGV browser to visualize the mapped reads (tophat_output_.file-accepted.bam). I will use it again to visualize and compare the original gtf file and the cufflinks gtf file. I will use the tophat ‘junctions.bed’ file to visualize the exon-exon junctions. I hope the comparison of original gtf and cufflinks gtf will help me to correctly annotate the genes, and ‘junctions’ will give clue of isoform of the genes if any. The expressions of the genes will be confirmed by the cufflinks ‘genes.fpkm’ file. The expression of the isoforms if any will be confirmed by the ‘isoform.fpkm’ file.
Whether, the tophat and cufflinks commands are okay to correctly annotate the genes and find isoform of the genes..?? Any suggestion will be highly appreciated..