Entering edit mode
6.7 years ago
manish.nu
•
0
Hi all!
I've been teaching myself to do RNA-Seq.
I prepared my libraries using KAPA mRNA HyperPrep Kit and had them sequenced single read 50bp
To generate my counts' file for DESeq2, I
Downloaded the reference/GTF files from Genecode:
Gencode/GRCh38.p10.genome.fa
Homo_sapiens.GRCh38.91.gtf
Indexed them as:
STAR --runMode genomeGenerate -- genomeDir $genomeDir --genomeFastaFiles /hg/GRCh38.p10.genome.fa --sjdbGTFfile /hg/Homo_sapiens.GRCh38.91.gtf --sjdb Overhang 49 --runThreadN 20
Aligned the fastq files as:
STAR --runThreadN 20 --genomeDir $genome_dir --readFilesCommand zcat --readFilesIn /projects/FastQ/post/SK1-1_S1.fastq.gz --outSAMtype BAM Unsorted --outFileNamePrefix /projects/aligned/post/P1_pre_
created HTSeqCount files using:
htseq-count -f bam -q -m union -s yes -t exon -i gene_id /projects/aligned/P1_post_Aligned.out.bam /Homo_sapiens.GRCh38.91.gtf > /projects/counts/htseq/P1_pre_hts_counts
My counts file look like this:
ENSG00000284741 0
ENSG00000284742 0
ENSG00000284743 0
ENSG00000284744 0
ENSG00000284745 0
ENSG00000284746 0
ENSG00000284747 0
ENSG00000284748 0
__no_feature 22375495
__ambiguous 0
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 18649806
All of the genes are reflected = 0. Can someone please tell me what am I doing wrong?
Does your GTF contain exon lines, actually?
this is what my gtf file shows:
It looks like it has 'exon' but I wouldn't trust myself right now.
Hi, Two suggestions - 1) Plz check the
Log.final.out
created by STAR at the end of the run. Check the mapping % there.2) If the mapping% is ok, then as a check you could re-run STAR with
option. You wouldn't need to run a count generator separately.
Very stupidly deleted the log files but as far as I remember seeing them in the beginning, all files mappped >30% except one which was in 20-30%
I'm very tempted to run the alignment again with --quantMode GeneCounts
Are the chromosome identifiers in your fasta genome the same as in the gtf?
The standard Genecode fasta has a header like "chr1 1". As far as I remember, STAR uses the full header to name its chromosomes. Can you report the result of
samtools view -H /projects/aligned/P1_post_Aligned.out.bam | head
and compare it to the first column of/Homo_sapiens.GRCh38.91.gtf
?Hi WouterDeCoster, to me, it looks like they're not the same (see below please). I went by the assumption that if you download your genome file and GTF file from the same source (Gencode in my case), they will be 'compatible'. This makes me doubt my whole alignment step.
Would you recommend that I start from scratch (Indexing>Alignment) and could you please suggest a source for a GTF file? Can I still salvage this by switching the GTF file?
If you get the correct GTF file from Gencode (http://www.gencodegenes.org/releases/current.html ) then the counting will work.
It's working! Thanks so much!
Just using the right gtf for counting would be sufficient, no need to realign, unless you want to use the gtf in alignment as well for defining the known exon structure.
GTF File:
Does this information help? To me it looks like they're not the same.