Question: Easy way to merge several gff output from maker
gravatar for Rox
3.9 years ago by
France / Toulouse / GeT-Plage
Rox1.2k wrote:

Hi everyone !

I'm still struggling on maker.

What I want to do, is to follow step by step the maker tutorial for Snap training (the one described here : ).

My input file is an genome assembly I made, with several contigs.

In the maker tutorial, they ask to convert the generated gff into a zff, and some other task. My problem is that, in the tutorial, they are working only on one locus, or one file. But in my results, I have like thousands of directories, each one containing several gff.

For example :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ ls
00  0B  16  22  2D  38  43  4F  5A  66  71  7C  87  92  9D  A9  B4  BF  CA  D5  E0  EB  F6
(...) DF  EA  F5

And each directory could possibly contain several subdirectories, but the number can vary from one to another :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ cd 0B/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ ls 
28  67  7C  92  D0  F8
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ cd 28
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ ls
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ cd tig00000634/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28/tig00000634$ ls

The ones I'm interested in are the simple tig00(number).gff, but I want to train snap for each contig like this, I want to train snap for the whole assembly, and I hope that I don't have to do this for each .gff file, even if a script do it for me... Because the folowing steps for Snap trainign require to launch maker again, with the hmm model produced by SNap on the whole genome.

What I want, is a easy way to convert all theses gff to only one output gff, which correspond to the maker output. I can't find something looking like this in the maker documentation, but I'm sure that maker users know a way to do what I want.

Do you have any advices ? Thanks for your help !



script annotation genome • 3.7k views
ADD COMMENTlink modified 3.9 years ago by Philipp Bayer6.7k • written 3.9 years ago by Rox1.2k

If you actual question is to convert multiple GTFs to single GTF file, you can try cuffmerge

1) create 'assembly_GTF_list.txt' file by listing all GTF files from your master directory. It works recursively, lists all .gtf files from all sub directories.

find ~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/ -type f -name "*.gtf" > assembly_GTF_list.txt

2) CuffMerge

cuffmerge [options]* assembly_GTF_list.txt

Sorry if I misunderstood your question.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.6k

My question was indeed something like this, but it was specific to maker, because the output directories are kind of strange to list, but I think that I can find theses informations in a maker output file, I think that Philipp answered my question below ! But thanks for helping ! It will be probably useful for me.

ADD REPLYlink written 3.9 years ago by Rox1.2k

Glad that you got your answer. Good luck.

ADD REPLYlink written 3.9 years ago by EagleEye6.6k
gravatar for Philipp Bayer
3.9 years ago by
Philipp Bayer6.7k
Philipp Bayer6.7k wrote:

You can merge all gff3 files from the MAKER output using the gff3_merge script that comes with MAKER -

there should be a file ending in index.log in the uppermost folder with your output data, let's call it example_index.log

gff3_merge -d example_index.log

This will merge all of the output into one gigantic gff3 file (repeatmasker results, alignments, various ab initio predictors etc.). This will also print the fasta sequences into the end of the gff3 file (use -n to stop that behaviour). If you want the MAKER gene models only use the -g flag, or use grep to search for any other data sources.

Looks like you can use the maker2zff script in the same way, then you can skip gff3_merge:

maker2zff -d example_index.log

(You should probably check the filtering options of that script to get better gene models, but the defaults look pretty stringent already)

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Philipp Bayer6.7k

Thanks a lot ! That was exactly what I was looking for ! That's pretty nice that I can use this directly for the zff conversion, I'm going to try it !

ADD REPLYlink written 3.9 years ago by Rox1.2k

Hiii....Maker didn't generate the protein and transcript sequence files even after fasta_merge program. Do you have any advices ? Thanks for your help

ADD REPLYlink modified 11 months ago • written 11 months ago by ashaneev0720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 950 users visited in the last hour