Is there a way to produce/convert nhmmer output to a bed file?
1
0
Entering edit mode
3.5 years ago
jamie.pike ▴ 80

I have recently used nhmmer to search for a sequence in a FASTA file, I'd like to compare the output of the nhmmer search to a bed file I have and think it would be easy to do if the nhmmer output is also in bed format.

Is there a tool to convert the nhmmer output? I have had a look in the nhmmer manual and can't find any specific flags for generating either a bed file output or a table output indicating locations of the hits.

Example of nhmmer output:

Annotation for each hit  (and alignments):
>> VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence  
    score  bias    Evalue   hmmfrom    hmm to     alifrom    ali to      envfrom    env to       sq len      acc
   ------ ----- ---------   -------   -------    --------- ---------    --------- ---------    ---------    ----
 !  134.7   1.6   3.9e-39         1       206 []   3509332   3509126 ..   3509332   3509126 ..   3744637    0.99

  Alignment:
  score: 134.7 bits
                                                                                              All_mimps_comb_trimmed.aln       1 AcagTgggatGcaAtaAGtttgaatacgtgtgtaagTagg 40     
                                                                                                                                 AcagTgggatGcaA+aAGt+t++++a+gtgt taagT   
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509332 ACAGTGGGATGCAAAAAGTATTCGCAGGTGTATAAGTC-- 3509295
                                                                                                                                 79***********************************8.. PP

                                                                                              All_mimps_comb_trimmed.aln      41 ttcccaagctagcctagtta.tggggtaccttagcttagg 79     
                                                                                                                                  +ccc+agct  +ctagtt  t+g  ta+ctt+ cttagg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509294 -GCCCTAGCTCCTCTAGTTCgTAGCTTAGCTTGACTTAGG 3509256
                                                                                                                                 .*************************************** PP

                                                                                              All_mimps_comb_trimmed.aln      80 gctttgctacttagatAtcacactgaaattagtataatat 119    
                                                                                                                                 + +ttg+tact +g+tAtc c ctga+ t agt+taatat
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509255 TACTTGATACTAGGTTATCCCCCTGATCTGAGTGTAATAT 3509216
                                                                                                                                 **************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     120 taaggtaTcgagtacctaaaaaa.aactcctagcTagcta 158    
                                                                                                                                  aaggta +gag acct+a aa+  ac+cct++cTagct 
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509215 -AAGGTACTGAGGACCTGACAAGcTACCCCTGACTAGCTT 3509177
                                                                                                                                 .8************************************** PP

                                                                                              All_mimps_comb_trimmed.aln     159 gcgagtgtc...ggtacttcaacacgtattcatacttatt 195    
                                                                                                                                  cgag++tc     tac+t  acac+t++++atactt+tt
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509176 CCGAGCATCagaCCTACCTGCACACCTGCGAATACTTTTT 3509137
                                                                                                                                 ************99************************** PP

                                                                                              All_mimps_comb_trimmed.aln     196 gCatcccaCtg 206    
                                                                                                                                 gCatcccaCtg
  VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509136 GCATCCCACTG 3509126
                                                                                                                                 *********96 PP

Thanks,
Jamie

bed nhmmer • 1.5k views
ADD COMMENT
4
Entering edit mode
3.5 years ago
Fatima ▴ 1000

I think if you use the commandline version, and add --noali option, that would be easier. Then you can use awk to extract the columns that you need:

awk '($1!~"#"){print $1,$2,$3}'

Just change 1,2,3 to columns that you're interested in.

I assume you need to add strand information too.

ADD COMMENT
0
Entering edit mode

@Fatima, great - that should work. Thank you :)

ADD REPLY
0
Entering edit mode

@Fatima, Strand information would be useful also but I can't work out where it is?

ADD REPLY
1
Entering edit mode

I assume since you're aligning the whole genome, it would be + for all the entries.

awk '($1!~"#"){print "VMNF01000014.1\t"$7"\t"$8"\t"$1"\t.\t+"}'

. for score column

I usually get the genes in the genome, and then use hmmer. I haven't used nhmmer.

https://bedtools.readthedocs.io/en/latest/content/general-usage.html#:~:text=BED%20format,Galaxy%20browser%20and%20for%20bedtools.

ADD REPLY
1
Entering edit mode

@Fatima, thank you. I'm trying to find a transposable element on specific scaffolds so there is variation in strandedness. Perhaps I will try annotating first using hmmer. Thank you for your help.

Ps I have found also that --tblout will provide location information and strandedness.

ADD REPLY
1
Entering edit mode

Aha! Have you tried ISFinder (for finding transposable elements)?

ADD REPLY
0
Entering edit mode

No, I haven't heard of it! I'll take a look at it. Thank you :)

ADD REPLY

Login before adding your answer.

Traffic: 2249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6