Now, I have some data from PacBio full-length cDNA sequencing and Illumina sequencing. And I want to calculate the gene expression leave (FPKM). There is no available genome for my species. I can do it used Trinity if only have Illumina data, but now I don't know how to do it when adding PacBio full-length cDNA reads? Thanks.
Since you don't have a reference genome/transcriptome you would need to generate your sample specific genemodels (ie: generate your own reference transcriptome from your pacbio data) then quantify the expression levels of your illumina data using your newly generated pacbio reference transcriptome with your preferred short-read gene expression pipeline.
To get started on making your own reference transcriptome from you pacbio data I would use their iso-seq pipeline. The link below that outlines the major steps of the pipeline, dependencies, and other tertiary pacbio iso-seq data analysis tools that may be useful.
-Iso-Seq Command Line Module from SMRT Link v4 https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki
"It includes includes three major steps:
- CCS: Getting CCS (circular consensus sequence) reads out of subreads BAM file.
- Classify: Identifying full-length CCS reads based on cDNA primers and polyA tail signal.
- Cluster: Isoform-level clustering and polishing to generate high-quality, full-length, transcript isoform sequences."
Once you have completed step 3 you should have an the output: hq_isoforms.fastq
This output includes the transcript sequences that are:
- "Full-length (as indicated by presence of cDNA primers)
- High-quality (predicted accuracy by default is >= 99%)
- Supported by 2 or more FL reads (unless you changed the default or are using older versions)."
From there you have several options. An overview can be seen: https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki/What-to-do-after-Iso-Seq-Cluster%3F
For your goal of adding pacbio to illumina data to calculate gene expression levels without a reference you would likely want to collapse the high quality isoforms into a single set of unique isoforms. To do that you could use Cogent or CD-Hit: https://github.com/Magdoll/cDNA_Cupcake/wiki/Tutorial:-Collapse-redundant-isoforms-without-genome
From there you should have a reference trascriptome that you could preform your preferred illumina data pipeline on.