RSEM implementation
1
0
Entering edit mode
10 months ago
anasjamshed ▴ 120

I have the virus genome(fasta) and gff file and I am trying to prepare-reference through the following commands:

rsem-prepare-reference --gff3 KT992094.1.gff3 KT992094.1.fasta

or

rsem-prepare-reference --gff3 KT992094.1.gff \ --gff3-genes-as-transcripts \ --bowtie \ KT992094.1.fasta \ ref/virus

But it's saying:

Invalid number of arguments!

How can I solve this issue?

RSEM • 2.2k views
ADD COMMENT
0
Entering edit mode

Try removing the backslashes and extra spaces in your second command. Backslashes escape characters, so if it's all one one line you'll be escaping important characters in your arguments and flags.

Also make sure that you're using the correct command line arguments and flags, since your two commands vary greatly in what they theoretically are doing.

ADD REPLY
0
Entering edit mode
rsem-prepare-reference --gff3 KT992094.1.gff3 KT992094.1.fasta basename

basesname will be the name for resulting index etc. So provide that.

ADD REPLY
0
Entering edit mode

which index?

ADD REPLY
1
Entering edit mode

The index you're trying to generate. Ensure you're either in the directory you wish the output of rsem-prepare-reference to be in, or you give that output directory along with a sensible prefix as the basename.

ADD REPLY
0
Entering edit mode

while running this:

rsem-prepare-reference --gff3 KT992094.1.gff3 KT992094.1.fasta out

I got this error:

rsem-gff3-to-gtf KT992094.1.gff3 out.gtf
Line 25 does not have 9 fields:

"rsem-gff3-to-gtf KT992094.1.gff3 out.gtf" failed! Plase check if you provide correct parameters/options for the pipeline!
ADD REPLY
2
Entering edit mode

Error is clearly noted. Check your gff file :

Line 25 does not have 9 fields

ADD REPLY
0
Entering edit mode

gff3 file

It contains 24 lines only and does not have 25 line

ADD REPLY
0
Entering edit mode

Are you sure? Paste the output to cat -te KT992094.1.gff3 && echo $? here - don't take a screenshot, copy and paste the content and code-format it.

ADD REPLY
0
Entering edit mode

I tried :

cat -te KT992094.1.gff3 && echo $?

And get the following results:

##sequence-region KT992094.1 1 15223$
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=11250$
KT992094.1^IGenbank^Iregion^I1^I15223^I.^I+^I.^IID=KT992094.1:1..15223;Dbxref=taxon:11250;gb-acronym=HRSV;gbkey=Src;genome=genomic;mol_type=viral cRNA;note=recombinant D46/D53 strain;old-name=Human respiratory syncytial virus;strain=A2$
KT992094.1^IGenbank^Igene^I45^I576^I.^I+^I.^IID=gene-NS1;Name=NS1;gbkey=Gene;gene=NS1;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I99^I518^I.^I+^I0^IID=cds-ALS35583.1;Parent=gene-NS1;Dbxref=NCBI_GP:ALS35583.1;Name=ALS35583.1;gbkey=CDS;gene=NS1;product=nonstructural protein 1;protein_id=ALS35583.1$
KT992094.1^IGenbank^Igene^I596^I1098^I.^I+^I.^IID=gene-NS2;Name=NS2;gbkey=Gene;gene=NS2;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I628^I1002^I.^I+^I0^IID=cds-ALS35584.1;Parent=gene-NS2;Dbxref=NCBI_GP:ALS35584.1;Name=ALS35584.1;gbkey=CDS;gene=NS2;product=nonstructural protein 2;protein_id=ALS35584.1$
KT992094.1^IGenbank^Igene^I1126^I2328^I.^I+^I.^IID=gene-N;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I1141^I2316^I.^I+^I0^IID=cds-ALS35585.1;Parent=gene-N;Dbxref=NCBI_GP:ALS35585.1;Name=ALS35585.1;gbkey=CDS;gene=N;product=nucleoprotein;protein_id=ALS35585.1$
KT992094.1^IGenbank^Igene^I2330^I3243^I.^I+^I.^IID=gene-P;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I2347^I3072^I.^I+^I0^IID=cds-ALS35586.1;Parent=gene-P;Dbxref=NCBI_GP:ALS35586.1;Name=ALS35586.1;gbkey=CDS;gene=P;product=phosphoprotein;protein_id=ALS35586.1$
KT992094.1^IGenbank^Igene^I3253^I4210^I.^I+^I.^IID=gene-M;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I3262^I4032^I.^I+^I0^IID=cds-ALS35587.1;Parent=gene-M;Dbxref=NCBI_GP:ALS35587.1;Name=ALS35587.1;gbkey=CDS;gene=M;product=matrix protein;protein_id=ALS35587.1$
KT992094.1^IGenbank^Igene^I4220^I4629^I.^I+^I.^IID=gene-SH;Name=SH;gbkey=Gene;gene=SH;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I4304^I4498^I.^I+^I0^IID=cds-ALS35582.1;Parent=gene-SH;Dbxref=NCBI_GP:ALS35582.1;Name=ALS35582.1;gbkey=CDS;gene=SH;product=small hydrophobic protein;protein_id=ALS35582.1$
KT992094.1^IGenbank^Igene^I4674^I5596^I.^I+^I.^IID=gene-G;Name=G;gbkey=Gene;gene=G;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I4689^I5585^I.^I+^I0^IID=cds-ALS35588.1;Parent=gene-G;Dbxref=NCBI_GP:ALS35588.1;Name=ALS35588.1;gbkey=CDS;gene=G;product=attachment glycoprotein G;protein_id=ALS35588.1$
KT992094.1^IGenbank^Igene^I5649^I7551^I.^I+^I.^IID=gene-F;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I5662^I7386^I.^I+^I0^IID=cds-ALS35589.1;Parent=gene-F;Dbxref=NCBI_GP:ALS35589.1;Name=ALS35589.1;gbkey=CDS;gene=F;product=fusion protein;protein_id=ALS35589.1$
KT992094.1^IGenbank^Igene^I7598^I8558^I.^I+^I.^IID=gene-M2;Name=M2;gbkey=Gene;gene=M2;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I7607^I8191^I.^I+^I0^IID=cds-ALS35591.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35591.1;Name=ALS35591.1;gbkey=CDS;gene=M2;product=m2-1;protein_id=ALS35591.1$
KT992094.1^IGenbank^ICDS^I8160^I8432^I.^I+^I0^IID=cds-ALS35592.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35592.1;Name=ALS35592.1;gbkey=CDS;gene=M2;product=m2-2 protein;protein_id=ALS35592.1$
KT992094.1^IGenbank^Igene^I8491^I15068^I.^I+^I.^IID=gene-L;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding$
KT992094.1^IGenbank^ICDS^I8499^I14996^I.^I+^I0^IID=cds-ALS35590.1;Parent=gene-L;Dbxref=NCBI_GP:ALS35590.1;Name=ALS35590.1;gbkey=CDS;gene=L;product=L polymerase protein;protein_id=ALS35590.1$
$
0

This file does not contain 25th line?

ADD REPLY
1
Entering edit mode
10 months ago
Ram 43k

Your GFF3 file has 25th line - see the lone $ in the cat -te output? That's an empty line. Delete the empty line using

sed '/^$/d' KT992094.1.gff3 > KT992094.1.new.gff3

And then run your old command (with the new gff3 file)

rsem-prepare-reference --gff3 KT992094.1.new.gff3 KT992094.1.fasta out
ADD COMMENT
0
Entering edit mode

Thanks. The command ran successfully but it threw the following errors:

rsem-gff3-to-gtf KT992094.1.new.gff3 out.gtf
GTF file is successully generated.
There are 0 transcripts contained in the generated GTF file.

rsem-extract-reference-transcripts out 0 out.gtf None 0 KT992094.1.fasta
The reference contains no transcripts!
"rsem-extract-reference-transcripts out 0 out.gtf None 0 KT992094.1.fasta" failed! Plase check if you provide correct parameters/options for the pipeline!
ADD REPLY
0
Entering edit mode

The error is pretty self-descriptive. Where did you get this gff3 file? You need a better transcript feature file. Also, this is a new topic so you should open a new question.

I'll move my comment to an answer, please accept it to provide closure to this post.

ADD REPLY
0
Entering edit mode

I run the following command, and it successfully finds the transcripts:

 rsem-prepare-reference --gff3 KT992094.1.new.gff3 --gff3-genes-as-transcripts --bowtie KT992094.1.fasta rsv-rsem
ADD REPLY
0
Entering edit mode

That's wonderful - congratulations on finding the solution yourself, and thank you for sharing it here. Please accept my answer to provide closure to the post.

ADD REPLY

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6