The question was initially "TPM Transcript per Million , gene / transcript length or both can be used ?"
I change the title...because I found answers to first questions by myself (so proud ^^) I saw there was a lot of views in a few time but with few answers...and I still have not the last word on the subject so I keep updating the post...think can be useful to others.
My post was initially about :
Can we calculate TPM directly from raw read count (from STAR output for example) ?
"Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK). Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor. Divide the RPK values by the “per million” scaling factor. This gives you TPM."
Do you consider the total length of the gene or just the sum of the exon length ?
UPDATE : sum of exon length
I remind that sometimes I saw that transcript length was used...here but it's only when you align on transcriptome.. Is that true ?
For example, SALMON,KALLISTO give TPM values using speudo-alignment methods...
I don't know if it's correct to compute TPM from a genome alignment. "Transcript per Million" unit make more sense when you use transcriptome to (speudo)-align , no ?
Said differently, TPM values from speudoalignments (kallisto,salmon) can't be compare with the ones computed from an genome alignment. We need to know how the guy produced its TPM values before comparing.
UPDATE : Transcriptome looks nicer. Ok I'll use salmon and not try to re-calculate myself (by the way -g [ --geneMap ] arg will do the work fo me)
UPDATE - TPM normilsation quest :How I do that properly without using Sleuth !? ::/