Entering edit mode
5.1 years ago
dq18
•
0
I use STAR +RSEM to align and quantify my RNA-seq raw data, while I got this after useing rsem-calculate-expression:
/gpfs/share/home/1701110265/soft/RSEM-1.3.1/rsem-calculate-expression --no-bam-output \
> --alignments -p 5 \
> -q /gpfs/share/home/1701110265/morus/04rsem/input_for_rsem.bam \
> /gpfs/share/home/1701110265/morus/04rsem/rsem \
> /gpfs/share/home/1701110265/morus/04rsem/test1
rsem-parse-alignments /gpfs/share/home/1701110265/morus/04rsem/rsem /gpfs/share/home/1701110265/morus/04rsem/test1.temp/test1 /gpfs/share/home/1701110265/morus/04rsem/test1.stat/test1 /gpfs/share/home/1701110265/morus/04rsem/input_for_rsem.bam 1 -tag XM -q
The SAM/BAM file declares more reference sequences (31302) than RSEM knows (29334)!
"rsem-parse-alignments /gpfs/share/home/1701110265/morus/04rsem/rsem /gpfs/share/home/1701110265/morus/04rsem/test1.temp/test1 /gpfs/share/home/1701110265/morus/04rsem/test1.stat/test1 /gpfs/share/home/1701110265/morus/04rsem/input_for_rsem.bam 1 -tag XM -q" failed! Plase check if you provide correct parameters/options for the pipeline!
I cannot find what is wrong with my code, here are the details: STAR-2.7.0e RSEM-1.3.1 the genome I use has 30301 scaffolds and gff3 file is used, I download them both in NCBI raw data is sequencing by illumina platform and they are SE (single end 75bp) reads .
the STAR codes are:
/gpfs/share/home/1701110265/soft/STAR-2.7.0e/bin/Linux_x86_64/STAR --runThreadN 6 --runMode genomeGenerate \
--genomeDir /gpfs/share/home/1701110265/morus/00ref/star.genome.test02/ \
--genomeFastaFiles /gpfs/share/home/1701110265/morus/00ref/Morus.notabilis.genome.fa \
--genomeChrBinNbits 15 \
--sjdbGTFtagExonParentTranscript Parent \
--sjdbGTFfile /gpfs/share/home/1701110265/morus/00ref/Morus.notabilis.genome.gff \
--sjdbOverhang 74
/gpfs/share/home/1701110265/soft/STAR-2.7.0e/bin/Linux_x86_64/STAR --runThreadN 5 --genomeDir /gpfs/share/home/1701110265/morus/00ref/star.genome.index \
--readFilesIn /gpfs/share/home/1701110265/morus/02clean_data/test/1h_AGTCAA_L002_R1_001.clean.fastq \
--outFileNamePrefix /gpfs/share/home/1701110265/morus/03align_out/test4/1h_ \
--outSAMtype BAM SortedByCoordinate \
--outBAMsortingThreadN 5 \
--quantMode TranscriptomeSAM GeneCounts
the RSEM codes are:
/gpfs/share/home/1701110265/soft/RSEM-1.3.1/rsem-prepare-reference --gff3 /gpfs/share/home/1701110265/morus/00ref/Morus.notabilis.genome.gff \
/gpfs/share/home/1701110265/morus/00ref/Morus.notabilis.genome.fa \
/gpfs/share/home/1701110265/morus/04rsem/rsem
/gpfs/share/home/1701110265/soft/RSEM-1.3.1/convert-sam-for-rsem /gpfs/share/home/1701110265/morus/03align_out/test3/1h_Aligned.toTranscriptome.out.bam /gpfs/share/home/1701110265/morus/04rsem/input_for_rsem
/gpfs/share/home/1701110265/soft/RSEM-1.3.1/rsem-calculate-expression --no-bam-output \
--alignments -p 5 \
-q /gpfs/share/home/1701110265/morus/04rsem/input_for_rsem.bam \
/gpfs/share/home/1701110265/morus/04rsem/rsem \
/gpfs/share/home/1701110265/morus/04rsem/test1
please help me. please
It seems RSEM is performing different filtering on the GFF3 file than STAR before making the transcriptome fasta file. You might make a transcriptome fasta file yourself and use that with STAR/RSEM, since then you know it will match.