Question

Is there a way to produce/convert nhmmer output to a bed file?

0

Entering edit mode

3.5 years ago

jamie.pike ▴ 80

I have recently used nhmmer to search for a sequence in a FASTA file, I'd like to compare the output of the nhmmer search to a bed file I have and think it would be easy to do if the nhmmer output is also in bed format.

Is there a tool to convert the nhmmer output? I have had a look in the nhmmer manual and can't find any specific flags for generating either a bed file output or a table output indicating locations of the hits.

Example of nhmmer output:

Annotation for each hit  (and alignments):
>> VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence  
    score  bias    Evalue   hmmfrom    hmm to     alifrom    ali to      envfrom    env to       sq len      acc
   ------ ----- ---------   -------   -------    --------- ---------    --------- ---------    ---------    ----
 !  134.7   1.6   3.9e-39         1       206 []   3509332   3509126 ..   3509332   3509126 ..   3744637    0.99

  Alignment:
  score: 134.7 bits
                                                                                              All_mimps_comb_trimmed.aln       1 AcagTgggatGcaAtaAGtttgaatacgtgtgtaagTagg 40     
                                                                                                                                 AcagTgggatGcaA+aAGt+t++++a+gtgt taagT   
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509332 ACAGTGGGATGCAAAAAGTATTCGCAGGTGTATAAGTC-- 3509295
                                                                                                                                 79***********************************8.. PP

                                                                                              All_mimps_comb_trimmed.aln      41 ttcccaagctagcctagtta.tggggtaccttagcttagg 79     
                                                                                                                                  +ccc+agct  +ctagtt  t+g  ta+ctt+ cttagg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509294 -GCCCTAGCTCCTCTAGTTCgTAGCTTAGCTTGACTTAGG 3509256
                                                                                                                                 .*************************************** PP

                                                                                              All_mimps_comb_trimmed.aln      80 gctttgctacttagatAtcacactgaaattagtataatat 119    
                                                                                                                                 + +ttg+tact +g+tAtc c ctga+ t agt+taatat
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509255 TACTTGATACTAGGTTATCCCCCTGATCTGAGTGTAATAT 3509216
                                                                                                                                 **************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     120 taaggtaTcgagtacctaaaaaa.aactcctagcTagcta 158    
                                                                                                                                  aaggta +gag acct+a aa+  ac+cct++cTagct 
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509215 -AAGGTACTGAGGACCTGACAAGcTACCCCTGACTAGCTT 3509177
                                                                                                                                 .8************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     159 gcgagtgtc...ggtacttcaacacgtattcatacttatt 195    
                                                                                                                                  cgag++tc     tac+t  acac+t++++atactt+tt
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509176 CCGAGCATCagaCCTACCTGCACACCTGCGAATACTTTTT 3509137
                                                                                                                                 ************99************************** PP

                                                                                              All_mimps_comb_trimmed.aln     196 gCatcccaCtg 206    
                                                                                                                                 gCatcccaCtg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509136 GCATCCCACTG 3509126
                                                                                                                                 *********96 PP

Thanks,
Jamie

bed nhmmer • 1.5k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 3.5 years ago by jamie.pike ▴ 80

score 4 · Accepted Answer · 2020-10-21

4

Entering edit mode

3.5 years ago

Fatima ▴ 1000

I think if you use the commandline version, and add --noali option, that would be easier. Then you can use awk to extract the columns that you need:

awk '($1!~"#"){print $1,$2,$3}'

Just change 1,2,3 to columns that you're interested in.

I assume you need to add strand information too.

ADD COMMENT • link 3.5 years ago by Fatima ▴ 1000

0

Entering edit mode

@Fatima, great - that should work. Thank you :)

ADD REPLY • link 3.5 years ago by jamie.pike ▴ 80

0

Entering edit mode

@Fatima, Strand information would be useful also but I can't work out where it is?

ADD REPLY • link 3.5 years ago by jamie.pike ▴ 80

1

Entering edit mode

I assume since you're aligning the whole genome, it would be + for all the entries.

awk '($1!~"#"){print "VMNF01000014.1\t"$7"\t"$8"\t"$1"\t.\t+"}'

. for score column

I usually get the genes in the genome, and then use hmmer. I haven't used nhmmer.

https://bedtools.readthedocs.io/en/latest/content/general-usage.html#:~:text=BED%20format,Galaxy%20browser%20and%20for%20bedtools.

ADD REPLY • link 3.5 years ago by Fatima ▴ 1000

1

Entering edit mode

@Fatima, thank you. I'm trying to find a transposable element on specific scaffolds so there is variation in strandedness. Perhaps I will try annotating first using hmmer. Thank you for your help.

Ps I have found also that --tblout will provide location information and strandedness.

ADD REPLY • link 3.5 years ago by jamie.pike ▴ 80

1

Entering edit mode

Aha! Have you tried ISFinder (for finding transposable elements)?

ADD REPLY • link 3.5 years ago by Fatima ▴ 1000

0

Entering edit mode

No, I haven't heard of it! I'll take a look at it. Thank you :)

ADD REPLY • link 3.5 years ago by jamie.pike ▴ 80