I am currently trying to analyze RNA-seq data using DEXSeq. I am facing problem during the step where reads are counted. The output of read counting step generated a file which has very few transcripts which have count more than 0 (around 20,000). The output looks like this:
ENSG00000000003:001 0 ENSG00000000003:002 0 ENSG00000000003:003 0 ENSG00000000003:004 0 ENSG00000000003:005 0 ENSG00000000003:006 0 ENSG00000000003:007 0 ENSG00000000003:008 0 ENSG00000000003:009 0 ENSG00000000003:010 0
The mapping was performed using STAR. The BAM files generated after mapping were sorted by coordinate. Following code was used during mapping:
STAR --runThreadN 12 --runMode genomeGenerate --sjdbGTFfile Homo_sapiens.GRCh38.98.gtf --genomeDir /home/erpl/star/indexing --genomeFastaFiles Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa
STAR --runThreadN 20 --genomeDir /home/erpl/RNA-seq_Alignment_tools/star/indexing --sjdbGTFfile /home/erpl/RNA-seq_Alignment_tools/star/indexing/Homo_sapiens.GRCh38.98.gtf --readFilesIn C24_1_1.fq C24_1_2.fq --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /home/erpl/RNA_seq_Novogene/RNA_Sequencing_Novogene_Results/output_17.10.19/C24_1
For DEXSeq first we have to prepare annotation file followed by read counting using 2 python scripts provided with the R package. Following is the code that I have used to reach to this step:
python /home/erpl/R/x86_64-pc-linux-gnu-library/3.6/DEXSeq/python_scripts/dexseq_prepare_annotation.py Homo_sapiens.GRCh38.98.gtf Homo_sapiens.GRCh38.98.gff python /home/erpl/R/x86_64-pc-linux-gnu-library/3.6/DEXSeq/python_scripts/dexseq_count.py Homo_sapiens.GRCh38.98.gff -p yes -f bam C24_1Aligned.sortedByCoord.out.bam DEXSeq_C24_1.txt
So is there some problem with the raw data itself or am I missing something here?