Cufflinks: How to calculate the total number of reads per class code (=, j, c, r, etc)
7.3 years ago

Hi all,

My question is: Does anyone know how I can combine the files output from the tuxedo pipeline in order to discover how many reads were mapped to transfrags from each of the class codes reported by cuffcompare (=, c, j, i, etc)?

I have tried calculating it from the coverage values reported from the .tmap files, but these calculations add up to a value higher than the total number of mapped reads. So, maybe calculating number of reads from "average coverage" is not the best way, but then where/how can I obtain these values directly without "reverse engineering" it.

My purpose:

I would like to reproduce for my dataset the analysis reported in the table 2 (which is transcribed in the end of the post) from the supplementary material of the 2010 nature biotechnology paper describing cufflinks (http://www.nature.com/nbt/journal/v28/n5/full/nbt.1621.html). However I have not found an elegant and straight forward way of calculating the "Assembled reads %" (column 4).

Belisa

Table 2 from the supplementary material of PMID: 20436464

Table 2. Classification of all transfrags produced at any time point with respect to annotated gene models and masked repeats in the mouse genome. Transfrags that are present in multiple time point assemblies are multiply counted to preserve the relative distribution of transfrags among the categories across the full experiment.

Category                      Transfrags      % of total transfrags      Assembled reads (%)
Match to known isoform        39,857          13.5                       76.7
Novel isoform of known gene   18,565          6.3                        11.3
Contained in known isoform    71,029          24.1                       4.6
Repeat                        41,906          14.2                       0.6
Intronic                      32,658          11.1                       0.6
Polymerase run-on             18,522          6.3                        0.5
Intergenic                    48,604          16.5                       1.2
Other artifacts               22,483          7.7                        4.5
Total transfrags              293,624         100                        100

