Closed:STAR index with NCBI files
0
0
Entering edit mode
5.9 years ago
cgias ▴ 20

Hello,

So when I finally made all the other threads I posted here work, I was told to change my genome & annotation file.... as an intern that I am, here we go again!

I am having some problems on creating the index with my files from NCBI.

The command I am using is

STAR --runMode genomeGenerate --runThreadN 6 --genomeDir /home/carolgs/Cpapaya/STARindex2 --genomeFastaFiles GCA_000150535.1_Papaya1.0_genomic.fna --sjdbGTFfile ref_Papaya1.0_top_level.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 49

I've tried it after converting to gtf

STAR --runMode genomeGenerate --runThreadN 6 --genomeDir /home/carolgs/Cpapaya/STARindex2 --genomeFastaFiles GCA_000150535.1_Papaya1.0_genomic.fna --sjdbGTFfile ref_Papaya1.0_top_level.gtf --sjdbOverhang 49

But I'm still getting the same Log:

[...]
May 09 12:11:42 ... completed Suffix Array index
May 09 12:11:42 ..... processing annotations GTF

Fatal INPUT FILE error, no valid exon lines in the GTF file: ref_Papaya1.0_top_level.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

May 09 12:11:44 ...... FATAL ERROR, exiting

Here is the first lines from my gff file:

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build Papaya1.0
#!genome-build-accession NCBI_Assembly:GCF_000150535.2
#!annotation-date 4 August 2017
#!annotation-source NCBI Carica papaya Annotation Release 100
##sequence-region NW_019011177.1 1 833
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3649
NW_019011177.1    RefSeq    region    1    833    .    +    .    ID=id0;Dbxref=taxon:3649;Name=LG1;chromosome=LG1;cultivar=SunUp;gbkey=Src;genome=genomic;map=unlocalized;mol_type=genomic DNA;note=transgenic for the papaya ringspot virus (PRSV) coat protein%2C which confers resistance to PRSV infection
##sequence-region NW_019011178.1 1 1200
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3649
NW_019011178.1    RefSeq    region    1    1200    .    +    .    ID=id1;Dbxref=taxon:3649;Name=LG1;chromosome=LG1;cultivar=SunUp;gbkey=Src;genome=genomic;map=unlocalized;mol_type=genomic DNA;note=transgenic for the papaya ringspot virus (PRSV) coat protein%2C which confers resistance to PRSV infection
##sequence-region NW_019011179.1 1 1384
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3649
NW_019011179.1    RefSeq    region    1    1384    .    +    .    ID=id2;Dbxref=taxon:3649;Name=LG1;chromosome=LG1;cultivar=SunUp;gbkey=Src;genome=genomic;map=unlocalized;mol_type=genomic DNA;note=transgenic for the papaya ringspot virus (PRSV) coat protein%2C which confers resistance to PRSV infection
NW_019011179.1    Gnomon    gene    5    1378    .    -    .    ID=gene0;Dbxref=GeneID:110806311;Name=LOC110806311;gbkey=Gene;gene=LOC110806311;gene_biotype=protein_coding;partial=true;start_range=.,5
NW_019011179.1    Gnomon    mRNA    5    1378    .    -    .    ID=rna0;Parent=gene0;Dbxref=GeneID:110806311,Genbank:XM_022034873.1;Name=XM_022034873.1;gbkey=mRNA;gene=LOC110806311;model_evidence=Supporting evidence includes similarity to: 9 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 6 samples with support for all annotated introns;partial=true;product=IAA-amino acid hydrolase ILR1-like;start_range=.,5;transcript_id=XM_022034873.1
NW_019011179.1    Gnomon    exon    1314    1378    .    -    .    ID=id3;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XM_022034873.1;gbkey=mRNA;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;transcript_id=XM_022034873.1
NW_019011179.1    Gnomon    exon    741    1055    .    -    .    ID=id4;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XM_022034873.1;gbkey=mRNA;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;transcript_id=XM_022034873.1
NW_019011179.1    Gnomon    exon    449    577    .    -    .    ID=id5;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XM_022034873.1;gbkey=mRNA;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;transcript_id=XM_022034873.1
NW_019011179.1    Gnomon    exon    5    328    .    -    .    ID=id6;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XM_022034873.1;gbkey=mRNA;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;start_range=.,5;transcript_id=XM_022034873.1
NW_019011179.1    Gnomon    CDS    1314    1364    .    -    0    ID=cds0;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XP_021890565.1;Name=XP_021890565.1;gbkey=CDS;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;protein_id=XP_021890565.1
NW_019011179.1    Gnomon    CDS    741    1055    .    -    0    ID=cds0;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XP_021890565.1;Name=XP_021890565.1;gbkey=CDS;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;protein_id=XP_021890565.1
NW_019011179.1    Gnomon    CDS    449    577    .    -    0    ID=cds0;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XP_021890565.1;Name=XP_021890565.1;gbkey=CDS;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;protein_id=XP_021890565.1
NW_019011179.1    Gnomon    CDS    5    328    .    -    0    ID=cds0;Parent=rna0;Dbxref=GeneID:110806311,Genbank:XP_021890565.1;Name=XP_021890565.1;gbkey=CDS;gene=LOC110806311;partial=true;product=IAA-amino acid hydrolase ILR1-like;protein_id=XP_021890565.1;start_range=.,5
##sequence-region NW_019011180.1 1 1405
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3649
NW_019011180.1    RefSeq    region    1    1405    .    +    .    ID=id7;Dbxref=taxon:3649;Name=LG1;chromosome=LG1;cultivar=SunUp;gbkey=Src;genome=genomic;map=unlocalized;mol_type=genomic DNA;note=transgenic for the papaya ringspot virus (PRSV) coat protein%2C which confers resistance to PRSV infection
##sequence-region NW_019011181.1 1 1505
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3649
NW_019011181.1    RefSeq    region    1    1505    .    +    .    ID=id8;Dbxref=taxon:3649;Name=LG1;chromosome=LG1;cultivar=SunUp;gbkey=Src;genome=genomic;map=unlocalized;mol_type=genomic DNA;note=transgenic for the papaya ringspot virus (PRSV) coat protein%2C which confers resistance to PRSV infection
##sequence-region NW_019011182.1 1 1521

And the converted to gtf file:

NC_010323.1    RefSeq    exon    63    137    .    -    .    transcript_id "rna30121"; gene_id "gene20201"; gene_name "trnH-GUG";
NC_010323.1    RefSeq    CDS    592    1653    .    -    0    transcript_id "gene20202"; gene_id "gene20202"; gene_name "psbA";
NC_010323.1    RefSeq    exon    1924    1959    .    -    .    transcript_id "gene20203"; gene_id "gene20203"; gene_name "trnK-UUU";
NC_010323.1    RefSeq    exon    4534    4569    .    -    .    transcript_id "gene20203"; gene_id "gene20203"; gene_name "trnK-UUU";
NC_010323.1    RefSeq    exon    1924    1958    .    -    .    transcript_id "rna30122"; gene_id "gene20203"; gene_name "trnK-UUU";
NC_010323.1    RefSeq    exon    4533    4569    .    -    .    transcript_id "rna30122"; gene_id "gene20203"; gene_name "trnK-UUU";
NC_010323.1    RefSeq    CDS    2266    3786    .    -    0    transcript_id "gene20204"; gene_id "gene20204"; gene_name "matK";
NC_010323.1    RefSeq    exon    5529    5755    .    -    .    transcript_id "gene20205"; gene_id "gene20205"; gene_name "rps16";
NC_010323.1    RefSeq    exon    6639    6678    .    -    .    transcript_id "gene20205"; gene_id "gene20205"; gene_name "rps16";
NC_010323.1    RefSeq    CDS    5529    5755    .    -    2    transcript_id "gene20205"; gene_id "gene20205"; gene_name "rps16";
NC_010323.1    RefSeq    CDS    6639    6678    .    -    0    transcript_id "gene20205"; gene_id "gene20205"; gene_name "rps16";
NC_010323.1    RefSeq    exon    7344    7415    .    -    .    transcript_id "rna30123"; gene_id "gene20206"; gene_name "trnQ-UUG";
NC_010323.1    RefSeq    CDS    7781    7966    .    +    0    transcript_id "gene20207"; gene_id "gene20207"; gene_name "psbK";
NC_010323.1    RefSeq    CDS    8408    8518    .    +    0    transcript_id "gene20208"; gene_id "gene20208"; gene_name "psbI";
NC_010323.1    RefSeq    exon    8617    8704    .    -    .    transcript_id "rna30124"; gene_id "gene20209"; gene_name "trnS-GCU";
NC_010323.1    RefSeq    exon    9669    9691    .    +    .    transcript_id "gene20210"; gene_id "gene20210"; gene_name "trnG-UCC";
NC_010323.1    RefSeq    exon    10427    10475    .    +    .    transcript_id "gene20210"; gene_id "gene20210"; gene_name "trnG-UCC";
NC_010323.1    RefSeq    exon    9669    9691    .    +    .    transcript_id "rna30125"; gene_id "gene20210"; gene_name "trnG-UCC";
NC_010323.1    RefSeq    exon    10427    10475    .    +    .    transcript_id "rna30125"; gene_id "gene20210"; gene_name "trnG-UCC";
NC_010323.1    RefSeq    exon    10658    10729    .    +    .    transcript_id "rna30126"; gene_id "gene20211"; gene_name "trnR-UCU";
NC_010323.1    RefSeq    CDS    11031    12548    .    -    0    transcript_id "gene20212"; gene_id "gene20212"; gene_name "atpA";
NC_010323.1    RefSeq    exon    12614    13023    .    -    .    transcript_id "gene20213"; gene_id "gene20213"; gene_name "atpF";
NC_010323.1    RefSeq    exon    13790    13934    .    -    .    transcript_id "gene20213"; gene_id "gene20213"; gene_name "atpF";
NC_010323.1    RefSeq    CDS    12614    13023    .    -    2    transcript_id "gene20213"; gene_id "gene20213"; gene_name "atpF";
NC_010323.1    RefSeq    CDS    13790    13934    .    -    0    transcript_id "gene20213"; gene_id "gene20213"; gene_name "atpF";
NC_010323.1    RefSeq    CDS    14500    14745    .    -    0    transcript_id "gene20214"; gene_id "gene20214"; gene_name "atpH";
NC_010323.1    RefSeq    CDS    15945    16688    .    -    0    transcript_id "gene20215"; gene_id "gene20215"; gene_name "atpI";
NC_010323.1    RefSeq    CDS    16922    17632    .    -    0    transcript_id "gene20216"; gene_id "gene20216"; gene_name "rps2";

So clearly I do have exons lines in both files. I can't see where is the problem.

Thanks in advance,

star rnaseq index ncbi • 383 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6