Question: RSEM with prokaryotes
0
gravatar for csmatyi1
22 months ago by
csmatyi10
csmatyi10 wrote:

I'm trying to run rsem for S. aureus strain USA300 FPR3757.

Then I saved this file as a GenBank file, which I want to transform into a gtf file for the RSEM pipeline.

Does anyone know of a program which transforms Ganbank format files from NCBI directly into a gtf file?

I was not aware of any gb -> gtf tranformation program.

However, I did find this website:

https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

which transforms a gb file to a gff file. Once I got the gff file, I used the gffread utility to transform it into a gtf file.

This gtf file looks like this:

Filename: Staphylococcus_aureus_genome_USA300_FPR3757.gtf

CP000255        GenBank exon    544     1905    .       +       .       transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255        GenBank CDS     544     1905    .       +       0       transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255        GenBank exon    2183    3316    .       +       .       transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255        GenBank CDS     2183    3316    .       +       0       transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255        GenBank exon    3697    3942    .       +       .       transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255        GenBank CDS     3697    3942    .       +       0       transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255        GenBank exon    3939    5051    .       +       .       transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";
CP000255        GenBank CDS     3939    5051    .       +       0       transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";

Pretty simple-looking, but it's due to the fact that I'm analyzing a prokaryote.

QUESTION: does RSEM work well with prokaryotic genomes?

Now, I ran rsem-prepare-reference this way:

rsem-prepare-reference --gtf Staphylococcus_aureus_genome_USA300_FPR3757.gtf --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ Staphylococcus_aureus_genome_USA300_FPR3757.fasta Staphylococcus_aureus_genome_USA300_FPR3757

The log file said that rsem-prepare-reference finished successfully.

However, when I run rsem-calculate-expression , it gives an error. I don't know why, I have been working on it 2 days now.

The command:

rsem-calculate-expression -p 8 --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ --output-genome-bam /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq  /path/to/rsem/files/saureus_usa300_fpr3757/Staphylococcus_aureus_genome_USA300_FPR3757 output

It gives this error:

nohup: ignoring input
/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 8  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD \@HD VN:1.4 SO:unsorted  --outFileNamePrefix output.temp/output  --readFilesIn /path/to/fastq/1/1_S1_L001_R1_001.trimmed.fq
Aug 29 09:51:11 ..... started STAR run
Aug 29 09:51:11 ..... loading genome
Aug 29 09:51:13 ..... started mapping
"/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 8  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD \@HD VN:1.4 SO:unsorted  --outFileNamePrefix output.temp/output  --readFilesIn /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq " failed! Plase check if you provide correct parameters/options for the pipeline!

Any help is appreciated.

annotation rsem gff genome gtf • 917 views
ADD COMMENTlink modified 22 months ago by Devon Ryan91k • written 22 months ago by csmatyi10

Run the command in the error message directly and see what error message it returns. That will undoubtedly be more informative. I suspect that the lack of gene and transcript entries in you GTF file is causing issues, but the error message from STAR can likely confirm/deny that suspicion.

ADD REPLYlink written 22 months ago by Devon Ryan91k

I edited the gtf file so it looks like this:

1       GenBank gene    544     1905    .       +       .       gene_id "SAUSA300_0001";
1       GenBank transcript      544     1905    .       +       .       gene_id "SAUSA300_0001"; transcript_id "SAUSA300_0001.t01";
1       GenBank exon    544     1905    .       +       .       gene_id "SAUSA300_0001"; transcript_id "SAUSA300_0001.t01";
1       GenBank gene    2183    3316    .       +       .       gene_id "SAUSA300_0002";
1       GenBank transcript      2183    3316    .       +       .       gene_id "SAUSA300_0002"; transcript_id "SAUSA300_0002.t01";
1       GenBank exon    2183    3316    .       +       .       gene_id "SAUSA300_0002"; transcript_id "SAUSA300_0002.t01";
1       GenBank gene    3697    3942    .       +       .       gene_id "SAUSA300_0003";
1       GenBank transcript      3697    3942    .       +       .       gene_id "SAUSA300_0003"; transcript_id "SAUSA300_0003.t01";
1       GenBank exon    3697    3942    .       +       .       gene_id "SAUSA300_0003"; transcript_id "SAUSA300_0003.t01";
1       GenBank gene    3939    5051    .       +       .       gene_id "SAUSA300_0004";
ADD REPLYlink modified 22 months ago • written 22 months ago by csmatyi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1495 users visited in the last hour