Quantification of RNA-Seq bam files (STAR + HTSeq count)
2
1
Entering edit mode
8.3 years ago
ChIP ▴ 600

Hi All,

I am working with pig genome and I had received some FASTQ transcriptome files from my lab mates.

I have processed the FASTQ files using STAR Aligner

~/Software/STAR/STAR-STAR_2.5.0a/bin/Linux_x86_64/STAR --genomeDir ~/data/genomes/pig/susScr3/star/ --readFilesIn ~/Bureaublad/Mod.GC025680.151030.HiSeq2500.FCB.lane2.gcap_dev.R1.fastqgz.gz --readFilesCommand zcat --sjdbGTFfile ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf --quantMode TranscriptomeSAM GeneCounts --runThreadN 12 --outFilterMismatchNmax 8 --outFileNamePrefix starGC025680 --outFilterMultimapNmax 10

The GTF file was taken from UCSC

It looks like this

chr1    susScr3_refGene    start_codon    133903981    133903983    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133903981    133904133    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133903981    133904133    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133914113    133914267    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133914113    133914267    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133917281    133917449    0.000000    +    2    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133917281    133917449    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133919376    133919382    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133919376    133919382    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133920137    133920252    0.000000    +    0    gene_id "NM_214429"; transcript_id "NM_214429";

Then I am using HTSeq_count function like this:

python ~/Software/HTSeq/HTSeq-0.6.1/scripts/htseq-count -m intersection-nonempty -t exon -i gene_id -f bam starGC025680Aligned.toTranscriptome.out.bam ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf -o Test

It processes the file, but dishes out

NM_214311    0
NM_214312    0
NM_214313    0
NM_214314    0
NM_214315    0
NM_214316    0
NM_214317    0
NM_214318    0
NM_214319    0
NM_214320    0
NM_214321    0
NM_214322    0
NM_214323    0
NM_214324    0
NM_214325    0

NR_130444    0
NR_131172    0
NR_132429    0
__no_feature    2844434
__ambiguous    0
__too_low_aQual    0
__not_aligned    0
__alignment_not_unique    534450

I don't know, what I am doing wrong and how can I fix this.

Kindly help

Thank you

RNA-Seq HTSeq_count STAR • 7.5k views
ADD COMMENT
1
Entering edit mode
8.3 years ago
igor 13k

The problem may be that you are using --quantMode TranscriptomeSAM, which outputs alignments translated into transcript coordinates (not genomic coordinates that htseq-count expects).

Also, you are using --quantMode GeneCounts, so you don't need to run htseq-count as STAR should be generating the counts for you.

ADD COMMENT
0
Entering edit mode
8.3 years ago
Czh3 ▴ 190

Can you paste your genome reference sequence (fasta) to us. I guess the chromosome name in your .fasta file are "1 2 3 ...", but in the GTF file are "chr1 chr2 chr3 ..."

ADD COMMENT
0
Entering edit mode

Hi,

I did grep on my genome file and here are the results:

>chr1
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chrX
>chrY
>chrM

It is susScr3.fa

ADD REPLY

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6