Question: rsem-calculate-expression error SAM/BAM file declares more reference sequences than Rsem
0
gravatar for sbhattach2
4 months ago by
sbhattach20
sbhattach20 wrote:

Hi,

I am facing an error with rsem-calculate-expression, while trying to process fastq files with STAR alignment option from RSEM. The alignment occurs perfectly, but when the rsem-parse-alignments command starts it throws an error that the SAM/BAM file declares more reference sequence than RSEM knows. Please find below the command and the output:

Input command

rsem-calculate-expression --star --star-path /home/sbhattach2/STAR-2.6.0a/bin/ \
    --star-gzipped-read-file -p 8 --paired-end --strandedness reverse \
    /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz \
    /data/Suro/Fasta/Rsem_Human_Ref1 \
    /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood

Output

/home/sbhattach2/STAR-2.6.0a/bin//STAR --genomeDir /data/Suro/Fasta --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD @hd VN:1.4 SO:unsorted --outFileNamePrefix /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood --readFilesCommand zcat --readFilesIn /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz
Nov 14 11:53:41 ..... started STAR run
Nov 14 11:53:41 ..... loading genome
Nov 14 11:55:02 ..... started mapping
Nov 14 12:10:05 ..... finished successfully

rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM
The SAM/BAM file declares more reference sequences (203798) than RSEM knows (196483)!
"rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

I have earlier used STAR 2.5.3a for the alignment using the same command and I got the TPM counts perfectly. However, when I re-ran the process using the same scripts with STAR 2.5.3a and rsem 1.3.0, I faced the issue. Now even after reinstalling STAR and rsem and also creating the reference again I get the same issue. The fasta file used is Homo_sapiens.GRCh37.dna.primary_assembly.fa and gtf gencode.v19.annotation_mod.gtf.

Please let me know, if you need any other information.

Thanks again for all the help in advance.

Surajit

rna-seq • 375 views
ADD COMMENTlink modified 3 months ago by swbarnes25.0k • written 4 months ago by sbhattach20
0
gravatar for swbarnes2
3 months ago by
swbarnes25.0k
United States
swbarnes25.0k wrote:

Are you sure that's the right gtf for that assembly? Maybe there is an incompatibility there that STAR is ignoring, but RSEM won't.

ADD COMMENTlink written 3 months ago by swbarnes25.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour