error while building database by snpEff
0
0
Entering edit mode
12 months ago
Mainul ▴ 10

I have 48 rice whole-genome sequences and I would like to do Analysis of the variants in the heat tolerance related genes and functional effects of non-synonymous SNPs. I already make assemble with MSUv6.1 (all.con) from http://rice.plantbiology.msu.edu/ and I have filtered vcf file. So now the problem occurs while to annotated vcf by snpEff. I tried with The rice7 gene model database for Oryza sativa (zip file) but its dose does not match with my vcf file.

I use the following command

java -jar snpEff.jar -v rice7 /usr/bin/filtered_snps_final.vcf > /usr/bin/filtered_snps_final.ann.vcf


and the result come

WARNING: Chromosome 'chr12|13112' not found. File '/home/songbk/Mainul_bin/Bioinfo/022MA2_filtered_snpsfinal.vcf', line 3645356

ERRORS: Some errors were detected
Error type  Number of errors
ERROR_CHROMOSOME_NOT_FOUND  3651011


.

After that, I also tried to build my own reference snpEff file as collect gff3 and fasat from http://rice.plantbiology.msu.edu/ by following this method (java -jar snpEff.jar build -gff3 -v MSU6v1). The gff file was namely all.gff3 (change to genes.gff) and fasta file was all.con (change to sequences.fa) and kept in MSU6v1 directory and also found FATAL ERROR: Most Exons do not have sequences!

FATAL ERROR: Most Exons do not have sequences!
There might be differences in the chromosome names used in the genes file '/home/songbk/Mainul_bin/Bioinfo/snpEff/./data/MSU6v1/genes.gff'
and the chromosme names used in the 'reference sequence' file.
Please check that chromosome names in both files match.
Chromosome names missing in 'reference sequence' file:  '1', '10', '11', '12', '2', '3', '4', '5', '6', '7', '8', '9', 'Sy', 'Un', , , , , , , , , , ,
Chromosome names missing in 'genes' file             :  '10|13110''11|13111''12|13112''1|13101''2|13102''3|13103''4|13104''5|13105''6|13106''7|13107''8|13108''9|13109')
Fatal Error # see screenshot 2

WARNING: Cannot find first exonic position after 27061823 for transcript '13105.m05011'
WARNING: Cannot find first exonic position after 20836530 for transcript '13102.m03765'
WARNING: Cannot find first exonic position after 28133402 for transcript '13103.m13008'
WARNING: Cannot find first exonic position after 21074629 for transcript '13102.m03809'
WARNING: Cannot find last exonic position before 25987811 for transcript '13105.m04749'
WARNING: Cannot find first exonic position after 2434416 for transcript '13106.m00521'
no sequence found #see screenshot 3


snp gene assembly software error • 501 views
1
Entering edit mode

If MSUv6.1 =/= Rice7 then you are not comparing the right genome builds.

0
Entering edit mode
1. Please see How to add images to a Biostars post to add your images properly. You'll need to use a password-free image hosting service such as imgbb, not a file sharing/cloud storage service such as google photos, google drive or dropbox.
2. Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (text becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
0
Entering edit mode

It was my first time to upload photos and code together, I will make sure in future to post in a proper way. Thanks for your help. Can you suggest to me which reference genome build should I have to take for MSUv6.1. java -jar snpEff.jar build -gff3 -v MSU6v1 i tried this to build my own genome build but I did not get any genome build.

0
Entering edit mode

Can you suggest to me which reference genome build should I have to take for MSUv6.1.

Which genome build/source did you use to do the original alignments? As you have discovered you can't mix and match genome builds.

0
Entering edit mode

So, to identify the heat shock gene I aligned weedy rice WGS sequences with MSUv6.1 reference genome original alignment source and finally create filtred SNP vcf file. So until SNP filtered there was no error. So once I did with variant calling further proceed to the SNP annotation by snpEff tools and I found no proper SNP database for MSUv6.1. So far I try to annotated by snpEFF version 4.2 database like rice_rap201304, rice_rap201503, rice5, rice6.1 and rice7. The error comes that mentioned in the first phase of output.