rsem-calculate-expression error SAM/BAM file declares more reference sequences than Rsem
1
1
Entering edit mode
2.4 years ago
sbhattach2 ▴ 10

Hi,

I am facing an error with rsem-calculate-expression, while trying to process fastq files with STAR alignment option from RSEM. The alignment occurs perfectly, but when the rsem-parse-alignments command starts it throws an error that the SAM/BAM file declares more reference sequence than RSEM knows. Please find below the command and the output:

Input command

rsem-calculate-expression --star --star-path /home/sbhattach2/STAR-2.6.0a/bin/ \
    --star-gzipped-read-file -p 8 --paired-end --strandedness reverse \
    /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz \
    /data/Suro/Fasta/Rsem_Human_Ref1 \
    /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood

Output

/home/sbhattach2/STAR-2.6.0a/bin//STAR --genomeDir /data/Suro/Fasta --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD @hd VN:1.4 SO:unsorted --outFileNamePrefix /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood --readFilesCommand zcat --readFilesIn /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz
Nov 14 11:53:41 ..... started STAR run
Nov 14 11:53:41 ..... loading genome
Nov 14 11:55:02 ..... started mapping
Nov 14 12:10:05 ..... finished successfully

rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM
The SAM/BAM file declares more reference sequences (203798) than RSEM knows (196483)!
"rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

I have earlier used STAR 2.5.3a for the alignment using the same command and I got the TPM counts perfectly. However, when I re-ran the process using the same scripts with STAR 2.5.3a and rsem 1.3.0, I faced the issue. Now even after reinstalling STAR and rsem and also creating the reference again I get the same issue. The fasta file used is Homo_sapiens.GRCh37.dna.primary_assembly.fa and gtf gencode.v19.annotation_mod.gtf.

Please let me know, if you need any other information.

Thanks again for all the help in advance.

Surajit

RNA-Seq • 2.8k views
ADD COMMENT
0
Entering edit mode

I meet the same question as you. Have you solved this? Thanks for you help in advance.

ADD REPLY
0
Entering edit mode

I'm going through the same problem. Did any of you solve it?

ADD REPLY
0
Entering edit mode
2.3 years ago
swbarnes2 9.7k

Are you sure that's the right gtf for that assembly? Maybe there is an incompatibility there that STAR is ignoring, but RSEM won't.

ADD COMMENT

Login before adding your answer.

Traffic: 2343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6