Hello,
I am trying to analyse some RNAseq data, and would like to get the genes count. I am using sra_tools to download the data, STAR to align with hg19 as reference. Unfortunately my output files has 90% 0s...
Here is my procedure: 1) getting the data
prefetch SRR1797218
2) convert to fastq (I tried with and without the trimming otion clip)
fastq-dump --clip SRR1797218.sra
3) building the genome reference
STAR --runMode genomeGenerate \ --genomeDir mypath/referenceIndex/ \ --genomeFastaFiles mypath/referenceIndex/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \ --sjdbGTFfile mypath/referenceIndex/Homo_sapiens.GRCh37.75.gtf \ --genomeChrBinNbits 10 \ --genomeSAsparseD 2 \ --outFileNamePrefix mypath/referenceIndex/genome_ \
4) aligning and counting
STAR --runMode alignReads \ --twopassMode None \ --genomeDir mypath/referenceIndex/ \ --readFilesIn mypath/inputFiles/SRR1797218/SRR1797218.fastq \ --outFileNamePrefix mypath/star_out/test \ --quantMode GeneCounts \ --outSAMtype BAM Unsorted \ --outSAMunmapped Within \
Fianlly the output file
testReadsPerGene.out.tab
is filed mainly with zeros
N_unmapped 46568980 46568980 46568980 \ N_multimapping 47171 47171 47171 \ N_noFeature 6857 7249
7268 \ N_ambiguous 9700 168 31 \ ENSG00000223972 0 0 0 \ ENSG00000227232 0 0 0 \ ENSG00000243485 0 0 0 \ ENSG00000237613 0 0 0 \ ENSG00000268020 0 0 0 \
I am a bit lost and completely new to this task and tools... Do you see what I am missing here ?
Best
B