Question: Is there a way to produce/convert nhmmer output to a bed file?
0
gravatar for jamie.pike
5 weeks ago by
jamie.pike60
jamie.pike60 wrote:

I have recently used nhmmer to search for a sequence in a FASTA file, I'd like to compare the output of the nhmmer search to a bed file I have and think it would be easy to do if the nhmmer output is also in bed format.

Is there a tool to convert the nhmmer output? I have had a look in the nhmmer manual and can't find any specific flags for generating either a bed file output or a table output indicating locations of the hits.

Example of nhmmer output:

Annotation for each hit  (and alignments):
>> VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence  
    score  bias    Evalue   hmmfrom    hmm to     alifrom    ali to      envfrom    env to       sq len      acc
   ------ ----- ---------   -------   -------    --------- ---------    --------- ---------    ---------    ----
 !  134.7   1.6   3.9e-39         1       206 []   3509332   3509126 ..   3509332   3509126 ..   3744637    0.99

  Alignment:
  score: 134.7 bits
                                                                                              All_mimps_comb_trimmed.aln       1 AcagTgggatGcaAtaAGtttgaatacgtgtgtaagTagg 40     
                                                                                                                                 AcagTgggatGcaA+aAGt+t++++a+gtgt taagT   
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509332 ACAGTGGGATGCAAAAAGTATTCGCAGGTGTATAAGTC-- 3509295
                                                                                                                                 79***********************************8.. PP

                                                                                              All_mimps_comb_trimmed.aln      41 ttcccaagctagcctagtta.tggggtaccttagcttagg 79     
                                                                                                                                  +ccc+agct  +ctagtt  t+g  ta+ctt+ cttagg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509294 -GCCCTAGCTCCTCTAGTTCgTAGCTTAGCTTGACTTAGG 3509256
                                                                                                                                 .*************************************** PP

                                                                                              All_mimps_comb_trimmed.aln      80 gctttgctacttagatAtcacactgaaattagtataatat 119    
                                                                                                                                 + +ttg+tact +g+tAtc c ctga+ t agt+taatat
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509255 TACTTGATACTAGGTTATCCCCCTGATCTGAGTGTAATAT 3509216
                                                                                                                                 **************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     120 taaggtaTcgagtacctaaaaaa.aactcctagcTagcta 158    
                                                                                                                                  aaggta +gag acct+a aa+  ac+cct++cTagct 
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509215 -AAGGTACTGAGGACCTGACAAGcTACCCCTGACTAGCTT 3509177
                                                                                                                                 .8************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     159 gcgagtgtc...ggtacttcaacacgtattcatacttatt 195    
                                                                                                                                  cgag++tc     tac+t  acac+t++++atactt+tt
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509176 CCGAGCATCagaCCTACCTGCACACCTGCGAATACTTTTT 3509137
                                                                                                                                 ************99************************** PP

                                                                                              All_mimps_comb_trimmed.aln     196 gCatcccaCtg 206    
                                                                                                                                 gCatcccaCtg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509136 GCATCCCACTG 3509126
                                                                                                                                 *********96 PP

Thanks, Jamie

bed file nhmmer • 121 views
ADD COMMENTlink written 5 weeks ago by jamie.pike60
4
gravatar for Fatima
5 weeks ago by
Fatima830
United states
Fatima830 wrote:

I think if you use the commandline version, and add --noali option, that would be easier. Then you can use awk to extract the columns that you need:

awk '($1!~"#"){print $1,$2,$3}'

Just change 1,2,3 to columns that you're interested in.

I assume you need to add strand information too.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Fatima830

@Fatima, great - that should work. Thank you :)

ADD REPLYlink written 5 weeks ago by jamie.pike60

@Fatima, Strand information would be useful also but I can't work out where it is?

ADD REPLYlink written 5 weeks ago by jamie.pike60
1

I assume since you're aligning the whole genome, it would be + for all the entries.

awk '($1!~"#"){print "VMNF01000014.1\t"$7"\t"$8"\t"$1"\t.\t+"}'

. for score column

I usually get the genes in the genome, and then use hmmer. I haven't used nhmmer.

https://bedtools.readthedocs.io/en/latest/content/general-usage.html#:~:text=BED%20format,Galaxy%20browser%20and%20for%20bedtools.

ADD REPLYlink written 5 weeks ago by Fatima830
1

@Fatima, thank you. I'm trying to find a transposable element on specific scaffolds so there is variation in strandedness. Perhaps I will try annotating first using hmmer. Thank you for your help.

Ps I have found also that --tblout will provide location information and strandedness.

ADD REPLYlink written 5 weeks ago by jamie.pike60
1

Aha! Have you tried ISFinder (for finding transposable elements)?

ADD REPLYlink written 5 weeks ago by Fatima830

No, I haven't heard of it! I'll take a look at it. Thank you :)

ADD REPLYlink written 5 weeks ago by jamie.pike60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour