Question

Easy way to merge several gff output from maker

1

Entering edit mode

7.7 years ago

Rox ★ 1.4k

Hi everyone !

I'm still struggling on maker.

What I want to do, is to follow step by step the maker tutorial for Snap training (the one described here : http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial#Training_ab_initio_Gene_Predictors ).

My input file is an genome assembly I made, with several contigs.

In the maker tutorial, they ask to convert the generated gff into a zff, and some other task. My problem is that, in the tutorial, they are working only on one locus, or one file. But in my results, I have like thousands of directories, each one containing several gff.

For example :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ ls
00  0B  16  22  2D  38  43  4F  5A  66  71  7C  87  92  9D  A9  B4  BF  CA  D5  E0  EB  F6
(...) DF  EA  F5

And each directory could possibly contain several subdirectories, but the number can vary from one to another :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ cd 0B/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ ls 
28  67  7C  92  D0  F8
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ cd 28
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ ls
tig00000634
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ cd tig00000634/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28/tig00000634$ ls
run.log
theVoid.tig00000634
tig00000634.gff
tig00000634.maker.augustus_masked.proteins.fasta
tig00000634.maker.augustus_masked.transcripts.fasta
tig00000634.maker.non_overlapping_ab_initio.proteins.fasta
tig00000634.maker.non_overlapping_ab_initio.transcripts.fasta

The ones I'm interested in are the simple tig00(number).gff, but I want to train snap for each contig like this, I want to train snap for the whole assembly, and I hope that I don't have to do this for each .gff file, even if a script do it for me... Because the folowing steps for Snap trainign require to launch maker again, with the hmm model produced by SNap on the whole genome.

What I want, is a easy way to convert all theses gff to only one output gff, which correspond to the maker output. I can't find something looking like this in the maker documentation, but I'm sure that maker users know a way to do what I want.

Do you have any advices ? Thanks for your help !

Cheers,

Roxane

annotation script genome • 6.3k views

ADD COMMENT • link updated 7.7 years ago by Philipp Bayer 8.3k • written 7.7 years ago by Rox ★ 1.4k

0

Entering edit mode

If you actual question is to convert multiple GTFs to single GTF file, you can try cuffmerge

1) create 'assembly_GTF_list.txt' file by listing all GTF files from your master directory. It works recursively, lists all .gtf files from all sub directories.

find ~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/ -type f -name "*.gtf" > assembly_GTF_list.txt

2) CuffMerge

cuffmerge [options]* assembly_GTF_list.txt

Sorry if I misunderstood your question.

ADD REPLY • link 7.7 years ago by EagleEye 7.5k

0

Entering edit mode

My question was indeed something like this, but it was specific to maker, because the output directories are kind of strange to list, but I think that I can find theses informations in a maker output file, I think that Philipp answered my question below ! But thanks for helping ! It will be probably useful for me.

ADD REPLY • link 7.7 years ago by Rox ★ 1.4k

0

Entering edit mode

Glad that you got your answer. Good luck.

ADD REPLY • link 7.7 years ago by EagleEye 7.5k

score 5 · Accepted Answer · 2016-08-16

5

Entering edit mode

7.7 years ago

Philipp Bayer 8.3k

You can merge all gff3 files from the MAKER output using the gff3_merge script that comes with MAKER -

there should be a file ending in index.log in the uppermost folder with your output data, let's call it example_index.log

gff3_merge -d example_index.log

This will merge all of the output into one gigantic gff3 file (repeatmasker results, alignments, various ab initio predictors etc.). This will also print the fasta sequences into the end of the gff3 file (use -n to stop that behaviour). If you want the MAKER gene models only use the -g flag, or use grep to search for any other data sources.

Looks like you can use the maker2zff script in the same way, then you can skip gff3_merge:

maker2zff -d example_index.log

(You should probably check the filtering options of that script to get better gene models, but the defaults look pretty stringent already)

ADD COMMENT • link 7.7 years ago by Philipp Bayer 8.3k

0

Entering edit mode

Thanks a lot ! That was exactly what I was looking for ! That's pretty nice that I can use this directly for the zff conversion, I'm going to try it !

ADD REPLY • link 7.7 years ago by Rox ★ 1.4k

0

Entering edit mode

Hiii....Maker didn't generate the protein and transcript sequence files even after fasta_merge program. Do you have any advices ? Thanks for your help

ADD REPLY • link 4.7 years ago by ashaneev07 ▴ 20