Question: Using RNA-seq reads from closely related species in BRAKER2 genome annotation pipeline
0
gravatar for ayala.usma
15 months ago by
ayala.usma0
University of Los Andes, Colombia
ayala.usma0 wrote:

Hi everyone!

I am currently starting to use the BRAKER2 pipeline for gene prediction in the genomes of two Phytophthora species. For those who don't know about the pipeline, BRAKER2 uses RNA-seq alignments as input to train GeneMark and AUGUSTUS ab initio gene predictors.

Since I only have RNA-seq data for one of the genomes, I was wondering if it would be a good idea to use RNA-seq reads from a closely related species for my case. Also, do you have any suggestions I could use when running this pipeline? I don't have much experience in annotation, so any idea would be appreciated.

Thank you very much in advance!

ADD COMMENTlink modified 15 months ago by harish230 • written 15 months ago by ayala.usma0
2
gravatar for harish
15 months ago by
harish230
harish230 wrote:

It depends. The following questions have to be answered by you.

Are the organisms in the same family or genus? How is the alignment rate? How much can you compromise on the false-positives?

Generally if the species are in the same genus or family, then it is generally fine for you to use those datasets for training in BRAKER. It is going to derive intronic-hints and later be creating an Augustus profile.But be sure that you do get a high unique mapping rate as they would be more important in running BRAKER.

My pipeline is something like this:

  1. Repeat mask the genome
  2. Multiple RNASeq bam files and merge them
  3. Also have proteins from closer species.
  4. Generate BRAKER tuned Augustus and Genemark profiles.
  5. Run gene-prediction using the above Genemark and Augustus on RepeatMasked genome.
ADD COMMENTlink written 15 months ago by harish230

Hey! Thank you very much for your answer.

The organisms belong to the same genus and are believed to be sister species, so I would say your suggestion is just perfect for my case. :D

ADD REPLYlink written 14 months ago by ayala.usma0

Do you use to use any of khmer recipes and afterqc on the RNA-Seq reads?

ADD REPLYlink written 14 months ago by Ric280

Yes, I do tend to do minimal QC on the reads like removing adapter, trimming low quality bases etc. But other than that, I tend to use all the reads that would have been QC'd.

I haven't followed Khmer recipes, partly because I was able to setup other alternatives faster and mostly since I've been working on PacBio since the past year. But thanks for Khmer, I'll explore it :)

ADD REPLYlink written 14 months ago by harish230

Which alternatives did you try out?

ADD REPLYlink written 14 months ago by Ric280

For QC? I had used FastQC+Trimmomatic. I found it sufficiently fast and good enough, given I picking up reads with q30 or more only.

I'm just using the RNAseq data to derive gene-structure hints to be honest, as I had a very good assembly (N50>7Mb, #Contigs/Scaffolds - 1300/900) with core gene set being 92% and above.

ADD REPLYlink written 14 months ago by harish230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1331 users visited in the last hour