Question: Quantification of RNA-Seq bam files (STAR + HTSeq count)
1
gravatar for ChIP
3.8 years ago by
ChIP520
Netherlands
ChIP520 wrote:

Hi All,

I am working with pig genome and I had received some FASTQ transcriptome files from my lab mates.

 

I have processed the FASTQ files using STAR Aligner

~/Software/STAR/STAR-STAR_2.5.0a/bin/Linux_x86_64/STAR --genomeDir ~/data/genomes/pig/susScr3/star/ --readFilesIn ~/Bureaublad/Mod.GC025680.151030.HiSeq2500.FCB.lane2.gcap_dev.R1.fastqgz.gz --readFilesCommand zcat --sjdbGTFfile ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf --quantMode TranscriptomeSAM GeneCounts --runThreadN 12 --outFilterMismatchNmax 8 --outFileNamePrefix starGC025680 --outFilterMultimapNmax 10

The GTF file was taken from UCSC

it looks like this 

chr1    susScr3_refGene    start_codon    133903981    133903983    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133903981    133904133    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133903981    133904133    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133914113    133914267    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133914113    133914267    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133917281    133917449    0.000000    +    2    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133917281    133917449    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133919376    133919382    0.000000    +    1    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    exon    133919376    133919382    0.000000    +    .    gene_id "NM_214429"; transcript_id "NM_214429";
chr1    susScr3_refGene    CDS    133920137    133920252    0.000000    +    0    gene_id "NM_214429"; transcript_id "NM_214429";

Then I am using HTSeq_count function like this :

python ~/Software/HTSeq/HTSeq-0.6.1/scripts/htseq-count -m intersection-nonempty -t exon -i gene_id -f bam starGC025680Aligned.toTranscriptome.out.bam ~/data/genomes/pig/susScr3/star/susScr3.04012016.gtf -o Test

It processes the file , but dishes out

NM_214311    0
NM_214312    0
NM_214313    0
NM_214314    0
NM_214315    0
NM_214316    0
NM_214317    0
NM_214318    0
NM_214319    0
NM_214320    0
NM_214321    0
NM_214322    0
NM_214323    0
NM_214324    0
NM_214325    0

NR_130444    0
NR_131172    0
NR_132429    0
__no_feature    2844434
__ambiguous    0
__too_low_aQual    0
__not_aligned    0
__alignment_not_unique    534450

I don't know, what I am doing wrong and how can I fix this.

 

Kindly help

Thank you

 

rna-seq star htseq_count • 4.1k views
ADD COMMENTlink modified 3.8 years ago by igor8.6k • written 3.8 years ago by ChIP520
1
gravatar for igor
3.8 years ago by
igor8.6k
United States
igor8.6k wrote:

The problem may be that you are using --quantMode TranscriptomeSAM, which outputs alignments translated into transcript coordinates (not genomic coordinates that htseq-count expects).

Also, you are using --quantMode GeneCounts, so you don't need to run htseq-count as STAR should be generating the counts for you.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by igor8.6k
0
gravatar for Czh3
3.8 years ago by
Czh3190
China
Czh3190 wrote:

Can you paste your genome reference sequence (fasta) to us. I guess the chromosome name in your .fasta file are "1 2 3 ...", but in the GTF file are "chr1 chr2 chr3 ..."

ADD COMMENTlink written 3.8 years ago by Czh3190

Hi,

I did grep on my genome file and here are the results:

 

>chr1
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chrX
>chrY
>chrM

It is susScr3.fa

ADD REPLYlink written 3.8 years ago by ChIP520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour