Hi everyone,
I am trying to identify differential gene expression in 64 Human Heart samples (21 normal & 43 heart-failure). I used GSNAP for aligning my reads to the reference genome (filtered, sorted & indexed the bam files):
gsnap -d <genome_file> --gunzip --batch=5 --nthreads=2 --distant-splice-penalty=10000 --clip-overlap -s <splice_file> -N 1 --npaths=100 -Q -v <snp_file> --read-group-id=sample1 --read-group-name=sample1 --read-group-library=sample1 --read-group-platform='Illumina' --format=sam sample1_R1.fastq.gz sample1_R2.fastq.gz 2> sample1.out | samtools view -bS - > sample1.bam
Following this, I filtered, sorted & indexed my bam files for all 64 samples. Then I ran cufflinks on the resulting bam files:
cufflinks -p 2 -o sample1-cufflinks-output sample1_filter_sort.bam 2> sample1.out
Then, I used cuffmerge to get the merged.gtf file:
cuffmerge -g <GTF-file> -s hg19.fa -p 25 <assembly_list.txt>
Following this, I used cuffdiff to get differentially expressed genes using:
cuffdiff -o heart_cuffdiff -L Control,HF --multi-read-correct --upper-quartile-norm -b hg19.fa -p 20 ./cuffmerge/merged.gtf <comma-separated-Normals> <comma-separated-HeartFailure>
Everything ran perfectly and I am interested in the following files in the cuffdiff output directory:
$ls *.diff
cds.diff cds_exp.diff gene_exp.diff isoform_exp.diff promoters.diff splicing.diff tss_group_exp.diff
$ls *.fpkm_tracking
cds.fpkm_tracking genes.fpkm_tracking isoforms.fpkm_tracking tss_groups.fpkm_tracking
The *.diff files will tell me how many genes are found to be differentially expressed between normals & heartfailing samples. And the *.fpkm_tracking files will give me metrics such as coverage and length etc. But I found no coverage information in the fpkm_tracking files. In fact, the class_code, nearest_ref_id, length, and coverage fields all have '-' value. All the *.fpkm_tracking files look like:
$head genes.fpkm_tracking
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage Control_FPKM Control_conf_lo Control_conf_hi Control_status HF_FPKM HF_conf_lo HF_conf_hi HF_status
XLOC_000001 - - XLOC_000001 - TSS1 chr1:12658-29348 - - 0.747812 0 2.09809 OK 0.900845 0 2.3693 OK
How can I get the these missing field values especially the coverage information? Just an FYI, in the cuffdiff commandline, I am supplying the merged GTF output from cuffmerge and not the cuffcompare output.