Question: Easy way to merge several gff output from maker
1
gravatar for Roxane Boyer
2.7 years ago by
Roxane Boyer920
France / Toulouse / GeT-Plage
Roxane Boyer920 wrote:

Hi everyone !

I'm still struggling on maker.

What I want to do, is to follow step by step the maker tutorial for Snap training (the one described here : http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial#Training_ab_initio_Gene_Predictors ).

My input file is an genome assembly I made, with several contigs.

In the maker tutorial, they ask to convert the generated gff into a zff, and some other task. My problem is that, in the tutorial, they are working only on one locus, or one file. But in my results, I have like thousands of directories, each one containing several gff.

For example :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ ls
00  0B  16  22  2D  38  43  4F  5A  66  71  7C  87  92  9D  A9  B4  BF  CA  D5  E0  EB  F6
(...) DF  EA  F5

And each directory could possibly contain several subdirectories, but the number can vary from one to another :

Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore$ cd 0B/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ ls 
28  67  7C  92  D0  F8
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B$ cd 28
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ ls
tig00000634
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28$ cd tig00000634/
Loutre:~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/0B/28/tig00000634$ ls
run.log
theVoid.tig00000634
tig00000634.gff
tig00000634.maker.augustus_masked.proteins.fasta
tig00000634.maker.augustus_masked.transcripts.fasta
tig00000634.maker.non_overlapping_ab_initio.proteins.fasta
tig00000634.maker.non_overlapping_ab_initio.transcripts.fasta

The ones I'm interested in are the simple tig00(number).gff, but I want to train snap for each contig like this, I want to train snap for the whole assembly, and I hope that I don't have to do this for each .gff file, even if a script do it for me... Because the folowing steps for Snap trainign require to launch maker again, with the hmm model produced by SNap on the whole genome.

What I want, is a easy way to convert all theses gff to only one output gff, which correspond to the maker output. I can't find something looking like this in the maker documentation, but I'm sure that maker users know a way to do what I want.

Do you have any advices ? Thanks for your help !

Cheers,

Roxane

script annotation genome • 2.7k views
ADD COMMENTlink modified 2.7 years ago by Philipp Bayer6.0k • written 2.7 years ago by Roxane Boyer920

If you actual question is to convert multiple GTFs to single GTF file, you can try cuffmerge

1) create 'assembly_GTF_list.txt' file by listing all GTF files from your master directory. It works recursively, lists all .gtf files from all sub directories.

find ~/Documents/maker/cleaned_suzukii_90_size.maker.output/cleaned_suzukii_90_size_datastore/ -type f -name "*.gtf" > assembly_GTF_list.txt

2) CuffMerge

cuffmerge [options]* assembly_GTF_list.txt

Sorry if I misunderstood your question.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by EagleEye6.2k

My question was indeed something like this, but it was specific to maker, because the output directories are kind of strange to list, but I think that I can find theses informations in a maker output file, I think that Philipp answered my question below ! But thanks for helping ! It will be probably useful for me.

ADD REPLYlink written 2.7 years ago by Roxane Boyer920

Glad that you got your answer. Good luck.

ADD REPLYlink written 2.7 years ago by EagleEye6.2k
3
gravatar for Philipp Bayer
2.7 years ago by
Philipp Bayer6.0k
Australia/Perth/UWA
Philipp Bayer6.0k wrote:

You can merge all gff3 files from the MAKER output using the gff3_merge script that comes with MAKER -

there should be a file ending in index.log in the uppermost folder with your output data, let's call it example_index.log

gff3_merge -d example_index.log

This will merge all of the output into one gigantic gff3 file (repeatmasker results, alignments, various ab initio predictors etc.). This will also print the fasta sequences into the end of the gff3 file (use -n to stop that behaviour). If you want the MAKER gene models only use the -g flag, or use grep to search for any other data sources.

Looks like you can use the maker2zff script in the same way, then you can skip gff3_merge:

maker2zff -d example_index.log

(You should probably check the filtering options of that script to get better gene models, but the defaults look pretty stringent already)

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Philipp Bayer6.0k

Thanks a lot ! That was exactly what I was looking for ! That's pretty nice that I can use this directly for the zff conversion, I'm going to try it !

ADD REPLYlink written 2.7 years ago by Roxane Boyer920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour