No Coverage Information In *.Fpkm_Tracking Files Obtained From Cuffdiff Output
0
0
Entering edit mode
10.4 years ago
komal.rathi ★ 4.1k

Hi everyone,

I am trying to identify differential gene expression in 64 Human Heart samples (21 normal & 43 heart-failure). I used GSNAP for aligning my reads to the reference genome (filtered, sorted & indexed the bam files):

gsnap -d <genome_file> --gunzip --batch=5 --nthreads=2 --distant-splice-penalty=10000 --clip-overlap -s <splice_file> -N 1 --npaths=100 -Q -v <snp_file> --read-group-id=sample1 --read-group-name=sample1 --read-group-library=sample1 --read-group-platform='Illumina' --format=sam sample1_R1.fastq.gz sample1_R2.fastq.gz  2> sample1.out | samtools view -bS - > sample1.bam

Following this, I filtered, sorted & indexed my bam files for all 64 samples. Then I ran cufflinks on the resulting bam files:

cufflinks -p 2 -o sample1-cufflinks-output sample1_filter_sort.bam 2> sample1.out

Then, I used cuffmerge to get the merged.gtf file:

cuffmerge -g <GTF-file> -s hg19.fa -p 25 <assembly_list.txt>

Following this, I used cuffdiff to get differentially expressed genes using:

cuffdiff -o heart_cuffdiff -L Control,HF --multi-read-correct --upper-quartile-norm -b hg19.fa -p 20 ./cuffmerge/merged.gtf <comma-separated-Normals> <comma-separated-HeartFailure>

Everything ran perfectly and I am interested in the following files in the cuffdiff output directory:

$ls *.diff
cds.diff  cds_exp.diff  gene_exp.diff  isoform_exp.diff  promoters.diff  splicing.diff  tss_group_exp.diff

$ls *.fpkm_tracking
cds.fpkm_tracking  genes.fpkm_tracking  isoforms.fpkm_tracking  tss_groups.fpkm_tracking

The *.diff files will tell me how many genes are found to be differentially expressed between normals & heartfailing samples. And the *.fpkm_tracking files will give me metrics such as coverage and length etc. But I found no coverage information in the fpkm_tracking files. In fact, the class_code, nearest_ref_id, length, and coverage fields all have '-' value. All the *.fpkm_tracking files look like:

$head genes.fpkm_tracking
tracking_id    class_code    nearest_ref_id    gene_id    gene_short_name    tss_id    locus    length    coverage    Control_FPKM    Control_conf_lo    Control_conf_hi    Control_status    HF_FPKM    HF_conf_lo    HF_conf_hi    HF_status
XLOC_000001    -    -    XLOC_000001    -    TSS1    chr1:12658-29348    -    -    0.747812    0    2.09809    OK    0.900845    0    2.3693    OK

How can I get the these missing field values especially the coverage information? Just an FYI, in the cuffdiff commandline, I am supplying the merged GTF output from cuffmerge and not the cuffcompare output.

cuffdiff fpkm coverage • 3.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6