Question: Help with nhmmscan on PacBio WGS Data with Dfam
0
gravatar for roxane.dunbar
2.7 years ago by
roxane.dunbar0 wrote:

Hello,

I have a few questions regarding nhmmscan. I am very new to using hmms and hmmscan, etc.

I am trying to replicate Pendleton et al., 2015 identification of of MEI insertions in the PacBio sequenced NA12878 genome. They state in the second to last paragraph of the supplementary data that they used nhmmer with Dfam using the script 'dfamscan.pl' with default parameters.

I am only interested in identifying the L1Hs in this genome, and I know from the paper there are 118 of them.

I tried running the script on default parameters, but after a week of running (and no standard output to say it was actually doing anything), I killed it, and decided to run with just the L1HS 5' hmm instead. it's been going for over 24 hours.

I guess my first question is, am I running it right? My command is as below:

perl /media/RAID/rdunbar/hmmer/dfamscan.pl -fastafile /media/RAID/rdunbar/hmmer/corrected_reads_gt4kb.fasta -hmmfile /media/RAID/rdunbar/hmmer/L1HS_L1/DF0000225.hmm -dfam_outfile /media/RAID/rdunbar/hmmer/Results/PacBio_Dfam_hits_DF0000225.out

The hmm file was obtained from here: http://dfam.org/entry/DF0000225

The fasta file is the cleaned reads from PacBio NA12878 run, and is 60G: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NA12878_PacBio_MtSinai/corrected_reads_gt4kb.fasta

My second question is with a 60G fasta file, and only searching for 1 element hmm, how long roughly should this take?

Running top shows:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6231 rdunbar 20 0 960284 93000 3188 S 94.4 0.2 1348:37 nhmmscan

Also, top seems to show nhmmscan as almost constantly in S mode. In the terminal, this is all that has been displayed since yesterday:

rdunbar@plymouthcruncher:/media/RAID/rdunbar/hmmer/hmmer-3.1b2/src$ ./nhmmscan /media/RAID/rdunbar/hmmer/L1HS_L1/DF0000225.hmm /media/RAID/rdunbar/hmmer/corrected_reads_gt4kb.fasta > /media/RAID/rdunbar/hmmer/Results/PacBio_nhmmer_hits_DF0000225.out

Any help would be most appreciated.

Kindest regards, Roxane

pacbio hmm dfam nhmmscan wgs • 808 views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by roxane.dunbar0
1
gravatar for roxane.dunbar
2.7 years ago by
roxane.dunbar0 wrote:

It has now run successfully. I was just impatient!

ADD COMMENTlink written 2.7 years ago by roxane.dunbar0

Congratulations!

ADD REPLYlink written 2.7 years ago by Kevin Blighe66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2236 users visited in the last hour