I have some paired end RNAseq
data and trying to use kallisto
to get counts per gene. when I get the abundance.tsv
file, but for many genes including housekeeping genes I see 0 (TPM
) which is not normal. to test if the data has problem or not I used STAR
aligner (and human genome
) and followed by HTSeq
. then I saw the housekeeping genes have high counts. so my conclusion is that data is fine and the problem maybe is from the kallisto
command or reference files (which was human transcriptom
) I used.
here is the command of kallisto that I used:
kallisto index -i Homo_sapiens.GRCh38.cdna.all.idx Homo_sapiens.GRCh38.cdna.all.fasta.gz
kallisto quant -i Homo_sapiens.GRCh38.cdna.all.idx -o output -b 100 reads_1.fastq.gz reads_2.fastq.gz
I got the cDNA reference from the Ensembl(Homo_sapiens.GRCh38.cdna.all.fasta
) and used that to build index file.
do you know what the problem could be? here is few lines for problematic gene:
target_id length eff_length est_counts tpm
ENST00000415118.1 8 9 0 0
ENST00000448914.1 13 1 0 0
ENST00000434970.2 9 10 0 0
ENST00000631435.1 12 13 0 0
ENST00000632684.1 12 13 0 0
ENST00000710614.1 16 4 0 0
ENST00000605284.1 17 5 0 0
ENST00000604642.1 23 11 0 0
ENST00000603077.1 31 7.33333 0 0
ENST00000229239.10 1285 1110.69 0 0
ENST00000604102.1 31 7.33333 0 0
ENST00000603693.1 19 7 0 0
ENST00000604950.1 31 7.33333 0 0