Entering edit mode
9.4 years ago
zhengyunchaosky
▴
20
Hi,
I try to use "ftp://ftp.ensemblgenomes.org/pub/release-30/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.30.gff3.gz" to build the database to do snp annotation.
java -Xmx15G -jar snpEff.jar build -gff3 -v IRGSP
However,I got warning information below.
00:00:00.000 SnpEff version SnpEff 4.1 (build 2015-01-07), by Pablo Cingolani
00:00:00.010 Command: 'build'
00:00:00.028 Building database for 'IRGSP'
00:00:00.029 Reading configuration file 'snpEff.config'. Genome: 'IRGSP'
00:00:01.262 done
Reading GFF3 data file : '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff'
Reading genes : ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ........................................................
Total: 35679 Genes added.
Reading transcripts : Total: 0 Transcripts added.
Reading exons : WARNING: Cannot find transcript 'transcript:EPlOSAT00000003714'. Created transcript 'transcript:EPlOSAT00000003714' and gene 'Gene_transcript:EPlOSAT00000003714'
for this exon. File '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 68 '1 agi exon 2631 2760 . - . Parent=transcript:EPlOS
AT00000003714;Name=EPlOSAE00000004118;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EPlOSAE00000004118;rank=1;version=1'
WARNING: Cannot find transcript 'transcript:OS01T0100100-01'. Created transcript 'transcript:OS01T0100100-01' and gene 'Gene_transcript:OS01T0100100-01' for this exon. File '/datacenter/disk2
/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 72 '1 irgsp exon 2983 3268 . + . Parent=transcript:OS01T0100100-01;Name=OS01T0100100-01.exon1;co
nstitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=OS01T0100100-01.exon1;rank=1;version=1'
Anyway, I successfully build the database to do my annotation.
Total: 248345 sequences added, 0 sequences ignored.
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons:
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
Remove empty chromosomes:
Marking as 'coding' from CDS information:
Done: 42132 transcripts marked
#-----------------------------------------------
# Genome name : 'IRGSP'
# Genome version : 'IRGSP'
# Has protein coding info : true
# Genes : 133430
# Protein coding genes : 42132
#-----------------------------------------------
# Transcripts : 97751
# Avg. transcripts per gene : 0.73
#-----------------------------------------------
# Checked transcripts :
# AA sequences : 0 ( 0.00% )
# DNA sequences : 0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 42132
# Length errors : 1261 ( 2.99% )
# STOP codons in CDS errors : 0 ( 0.00% )
# START codon errors : 9404 ( 22.32% )
# STOP codon warnings : 1929 ( 4.58% )
# UTR sequences : 38051 ( 38.93% )
# Total Errors : 10269 ( 24.37% )
#-----------------------------------------------
# Cds : 163451
# Exons : 248345
# Exons with sequence : 248345
# Exons without sequence : 0
# Avg. exons per transcript : 2.54
# WARNING : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes : 61
# Chromosomes names [sizes] :
# '1' [43270923]
# '3' [36413819]
# '2' [35937250]
# '4' [35502694]
# '6' [31248787]
# '5' [29958434]
# '7' [29697621]
# '11' [29021106]
# '8' [28443022]
# '12' [27531856]
# '10' [23207287]
# '9' [23012720]
# 'AP008246' [206004]
# 'AP008247' [157458]
# 'AC160949' [128256]
# 'AC156495' [88500]
# 'AC155918' [32941]
# 'Syng_TIGR_028' [31094]
# 'Syng_TIGR_023' [24772]
# 'Syng_TIGR_045' [22545]
# 'Syng_TIGR_005' [21787]
# 'Syng_TIGR_014' [21421]
# 'Syng_TIGR_047' [20829]
# 'Syng_TIGR_026' [19971]
# 'Syng_TIGR_004' [19457]
# 'Syng_TIGR_021' [17477]
# 'Syng_TIGR_008' [16676]
# 'Syng_TIGR_012' [16417]
# 'Syng_TIGR_010' [15493]
# 'AC174930' [15426]
# 'Syng_TIGR_002' [14476]
# 'Syng_TIGR_037' [13061]
# 'Syng_TIGR_029' [12884]
# 'Syng_TIGR_016' [12792]
# 'Syng_TIGR_027' [11522]
# 'Syng_TIGR_046' [11447]
# 'Syng_TIGR_033' [11093]
# 'Syng_TIGR_011' [10901]
# 'Syng_TIGR_030' [10794]
# 'Syng_TIGR_020' [10699]
# 'Syng_TIGR_035' [10686]
# 'Syng_TIGR_015' [10595]
# 'Syng_TIGR_013' [10512]
# 'Syng_TIGR_036' [10434]
# 'Syng_TIGR_019' [10422]
# 'Syng_TIGR_034' [10311]
# 'Syng_TIGR_009' [10296]
# 'Syng_TIGR_041' [10210]
# 'Syng_TIGR_024' [10060]
# 'Syng_TIGR_022' [9889]
# 'Syng_TIGR_032' [9603]
# 'Syng_TIGR_031' [9548]
# 'Syng_TIGR_050' [8529]
# 'Syng_TIGR_038' [8197]
# 'Syng_TIGR_007' [7820]
# 'Syng_TIGR_048' [7140]
# 'Syng_TIGR_039' [6269]
# 'Syng_TIGR_049' [6261]
# 'Syng_TIGR_044' [6000]
# 'Syng_TIGR_042' [5510]
# 'Syng_TIGR_043' [4236]
#-----------------------------------------------
00:01:02.976 Caracterizing exons by splicing (stage 1) :
....................................................................................................
100000 ....................................................................................................
200000 ................................................
00:01:03.841 Caracterizing exons by splicing (stage 2) :
....................................................................................................
100000 ....................................................................................................
200000 ................................................00:01:04.100 done.
00:01:04.101 [Optional] Rare amino acid annotations
00:01:04.106 Warning: Cannot read optional protein sequence file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/protein.fa', nothing done.
00:01:04.108 Saving database
00:01:54.212 [Optional] Reading regulation elements: GFF
00:01:54.214 Warning: Cannot read optional regulation file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.gff', nothing done.
00:01:54.214 [Optional] Reading regulation elements: BED
00:01:54.215 Cannot find optional regulation dir '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.bed/', nothing done.
00:01:54.216 [Optional] Reading motifs: GFF
00:01:54.217 Warning: Cannot open PWMs file /datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/pwms.bin. Nothing done
00:01:54.217 Done
00:01:54.240 Logging
00:01:55.259 Checking for updates...
I wonder why the warning information came up. Thanks in advance!
Maybe there are some 'errors' in the gff3 you've used for creating the db. Can you check if
EPlOSAT00000003714
transcript has a transcript line in the gff3 file (apart from exon line)?Thanks for your respond! Yes, it has the transcript line.