Question: Counting No. of Reads per gene from RNA-Seq with STAR
0
gravatar for skevalkumar
17 months ago by
skevalkumar0 wrote:

I am trying to get no. of reads per gene with STAR --quantMode.

I have followed the

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /media/rm313/SDC/genomeDir \
  --genomeFastaFiles Brassica_napus_v4.1.chromosomes.fa --sjdbGTFfile Brassica_napus.annotation_v5.gtf \
  --genomeChrBinNbits 18 --limitGenomeGenerateRAM 910000000000

step to generate Genome.

To get no. of reads, i used

STAR --runThreadN 16  --runMode alignReads --outFilterMultimapNmax 1 \
  --outFilterMatchNmin 35 --outSAMtype SAM --quantMode GeneCounts \
  --twopassMode Basic --outFileNamePrefix "Wes1" --genomeDir /media/rm313/SDC/genomeDir \
  --sjdbGTFfeatureExon /media/rm313/SDC/Brassica_napus.annotation_v5.cds.fa \
  --readFilesIn /media/rm313/SDC/W1F.fastq /media/rm313/SDC/W1R.fastq

My questions

1) ReadsperGene.out.tab does not contain No. of reads per gene. it shows

N_unmapped    453074  453074  453074
N_multimapping    258436  258436  258436
N_noFeature   738252  812120  865586
N_ambiguous   0   0   0
  

2) I need reads per gene. however, it shows on chromosome name in Log.out and SJ.out.tab file.

Can you please suggest, what's wrong with these commands?

Thank you

rna-seq star • 1.2k views
ADD COMMENTlink modified 17 months ago by h.mon31k • written 17 months ago by skevalkumar0
1

Your original mapping command had a space between the dashes and the option name (-- twopassMode Basic), but I believe your actual command doesn't have this space, otherwise STAR would error out.

ADD REPLYlink written 17 months ago by h.mon31k
1

after you run STAR and you do an ls -t command, which files do you see being generated and what's their content?

ADD REPLYlink written 17 months ago by Friederike6.2k

In the second STAR step, it generated Aligned.out.sam, Aligned.out.sam, Log.final.out, Log.progress.out, Log.out, ReadsPerGene.out.tab (already mentioned), SJ.out.tab

ADD REPLYlink written 17 months ago by skevalkumar0

it shows on chromosome name in Log.out and SJ.out.tab file

What did you mean by that?

ADD REPLYlink written 17 months ago by Friederike6.2k
2
gravatar for h.mon
17 months ago by
h.mon31k
Brazil
h.mon31k wrote:

Yiu are using incorrectly the --sjdbGTFfeatureExon option, you should not pass a fasta file to it, rather, you should name the feature type you want to quantify (e.g. exon, CDS, and so on).

--sjdbGTFfeatureExon
    default: exon
    string: feature type in GTF file to be used as exons for building transcripts
ADD COMMENTlink written 17 months ago by h.mon31k

I have followed your suggestions, still it gave the same output (--twopassMode Basic and --sjdbGTFfeatureExon CDS)

ADD REPLYlink written 17 months ago by skevalkumar0
1

Can you show the output of head Brassica_napus.annotation_v5.gtf?

ADD REPLYlink written 17 months ago by h.mon31k
chrA01  GazeA2  exon    831 1437    7.80    +   .   transcript_id "BnaA01g00010D";
chrA01  GazeA2  CDS 1070    1345    .   +   0   transcript_id "BnaA01g00010D";
chrA01  GazeA2  exon    1487    2124    2.15    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2256    2436    3.19    -   .   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 1645    2124    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  CDS 2256    2282    .   -   0   transcript_id "BnaA01g00020D";
chrA01  GazeA2  exon    2665    3177    4.45    -   .   transcript_id "BnaA01g00030D";
ADD REPLYlink modified 17 months ago by h.mon31k • written 17 months ago by skevalkumar0

Any suggestion/comment on GTF file? or modification in input file?

ADD REPLYlink written 17 months ago by skevalkumar0
1

It seems your gtf is missing "gene_id" tags, which is used by STAR. Where did you obtain this annotation from? Did you convert a gff annotation to gtf?

Probably you can set some STAR parameter to correctly handle this gtf, but you have to do some reading. Some posts that might be helpful:

read counts with gff file

Parent child relationship to use with GFF data. Are alignment to transcripts on exon or transcripts?

NCBI's GFF file: how to get gene counts with STAR

Read counts of STAR with gff file

ADD REPLYlink written 17 months ago by h.mon31k

I have downloaded it from http://brassicadb.org/brad/datasets/pub/BrassicaceaeGenome/Brassica_napus/ Yes, I have converted GFF3 to GTF (using gffread)

ADD REPLYlink written 17 months ago by skevalkumar0
1

What happens if you use --sjdbGTFfile Brassica_napus.annotation_v5.gtf in your command line?

ADD REPLYlink modified 17 months ago • written 17 months ago by swbarnes28.6k

It shows FATAL Error, could not open file pGe.sjdbGTFfile=Brassica_napus.annotation_v5.gtf

ADD REPLYlink written 17 months ago by skevalkumar0

Do you think you need that file? You certainly don't need the file of cDNA transcripts that you included for some reason.

ADD REPLYlink written 17 months ago by swbarnes28.6k

I have changed it with --sjdbGTFfeatureExon exon, but there is no change in the output.

ADD REPLYlink written 17 months ago by skevalkumar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1772 users visited in the last hour