Question

running tophat+cufflinks but result in 0 FPKMs

0

Entering edit mode

9.3 years ago

Pei ▴ 170

Hi:

I am running tophat+cufflinks to analyze a RNA-seq dataset downloaded from GEO[1]

Below is my procedure. (I run fastq-dump to convert the sra file into fastq.) but I found the FPKM values are all 0 in my resulting transcripts.gtf file.

Could you please give me some tips/advices on how to calculate the FPKM values ?

Thanks a lot!

[1] ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR306/SRR306758/SRR306758.sra

tophat mm9 SRR306758.fastq
[Mon Jan  5 14:23:52 2015] Beginning TopHat run (v1.3.1)
-----------------------------------------------
[Mon Jan  5 14:23:52 2015] Preparing output location ./tophat_out/
[Mon Jan  5 14:23:52 2015] Checking for Bowtie index files
[Mon Jan  5 14:23:52 2015] Checking for reference FASTA file
Warning: Could not find FASTA file mm9.fa
[Mon Jan  5 14:23:52 2015] Reconstituting reference FASTA file from Bowtie index
Executing: /genome/bin/bowtie-inspect mm9 > ./tophat_out/tmp/mm9.fa
[Mon Jan  5 14:26:32 2015] Checking for Bowtie
Bowtie version: 0.12.7.0
[Mon Jan  5 14:26:32 2015] Checking for Samtools
Samtools Version: 0.1.19
[Mon Jan  5 14:26:32 2015] Generating SAM header for mm9
[Mon Jan  5 14:27:03 2015] Preparing reads
format: fastq
quality scale: phred33 (default)
Left  reads: min. length=76, count=18238434
[Mon Jan  5 14:33:01 2015] Mapping left_kept_reads against mm9 with Bowtie 
[Mon Jan  5 15:37:36 2015] Processing bowtie hits
[Mon Jan  5 15:50:07 2015] Mapping left_kept_reads_seg1 against mm9 with Bowtie (1/3)
[Mon Jan  5 16:59:49 2015] Mapping left_kept_reads_seg2 against mm9 with Bowtie (2/3)
[Mon Jan  5 18:15:05 2015] Mapping left_kept_reads_seg3 against mm9 with Bowtie (3/3)
[Mon Jan  5 19:11:27 2015] Searching for junctions via segment mapping
[Mon Jan  5 19:17:48 2015] Retrieving sequences for splices
[Mon Jan  5 19:21:26 2015] Indexing splices
[Mon Jan  5 19:21:44 2015] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie (1/3)
[Mon Jan  5 19:31:38 2015] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie (2/3)
[Mon Jan  5 19:41:48 2015] Mapping left_kept_reads_seg3 against segment_juncs with Bowtie (3/3)
[Mon Jan  5 19:50:39 2015] Joining segment hits
[Mon Jan  5 19:59:59 2015] Reporting output tracks
-----------------------------------------------
Run complete [05:46:13 elapsed]

cufflinks accepted_hits.bam -G /disk4/linp/brawand_Data/Mus_musculus.NCBIM37.60.gtf -o SRR306758_trial_1

You are using Cufflinks v2.2.1, which is the most recent release.
[09:45:41] Loading reference annotation.
[09:45:55] Inspecting reads and determining fragment length distribution.
> Processed 27910 loci.                        [*************************] 100%
> Map Properties:
> Normalized Map Mass: 13631307.57
> Raw Map Mass: 13631307.57
> Fragment Length Distribution: Truncated Gaussian (default)
>               Default Mean: 200
>            Default Std Dev: 80
[09:46:29] Estimating transcript abundances.
> Processing Locus 8:122386172-122400888       [************************ ]  99%
8:122434259-126869624 Warning: Skipping large bundle.
> Processed 27909 loci.                        [*************************] 100%

tophat FPKM cufflinks • 2.9k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Pei ▴ 170

0

Entering edit mode

Can you check the alignment statistic? E.g. how many reads are mapped to the genome? Have you try to ask cufflink to also output the raw read counts?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Sam ★ 4.7k

0

Entering edit mode

Can you check accepted_hits.bam file and see that you mapped a reasonable number of reads?

Have you tried using cufflinks -g? Have you checked that the names in your reference gtf file are EXACTLY the same of those you used for mapping on? Otherwise you will not map anything (beacuse you are using -G)

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Fabio Marroni ★ 3.0k

Ram · Answer 1 · 2015-02-09

Hi:

These days I was suffering from calculation of gene expression values (FPKM) based on RNA-seq data.

Now I figure out the very basic procedure which I wish to share with you here:

GTF file was downloaded from ensembl: ftp://ftp.ensembl.org/pub/release-62/gtf/drosophila_melanogaster/

Genome sequence data was downloaded from: ftp://ftp.ensembl.org/pub/release-62/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP5.25.62.dna_rm.toplevel.fa.gz

After decompressing, run:

bowtie-build Drosophila_melanogaster.BDGP5.25.62.dna_rm.toplevel.fa dmelanogasterBDGP5_25_62

then:

tophat -G Drosophila_melanogaster.BDGP5.25.62.gtf -o C1_R1_thout -r 200 /disk4/linp/DataSeq/Drosophila_melanogaster/dmelanogasterBDGP5_25_62 GSM794484_C1_R1_1.fq GSM794484_C1_R2_2.fq
cufflinks -G Drosophila_melanogaster.BDGP5.25.62.gtf -o C1_R1_clout C1_R1_thout/accepted_hits.bam

These commands will calculate FPKM values for each transcripts(genes), using transcripts IDs as identifiers, rather than something like "CUFF.2.2".

The parameter -r may be tested by different values before making the final choice. (I have not tried it yet)

Thank you!