Question: Getting Number Of Reads In A Contig From Velvet Assembly
4
gravatar for Abhi
7.9 years ago by
Abhi50
Abhi50 wrote:

How to get the contig details from the Velvet assembly? I have the AMOS output from the velvet i.e. velvet_asm.afg file.

Thanks -Abhi

contigs velvet • 4.3k views
ADD COMMENTlink modified 7.1 years ago by Prakki Rama2.2k • written 7.9 years ago by Abhi50
2
gravatar for Michael Schubert
7.9 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

It's basically these 4 steps:

  • install AMOS if you haven't already
  • create a bank from your afg file
  • select a contig and create a new bank from it
  • extract the reads from the new bank

You can find the details on this blog post if you want to get it done quickly but AMOS also has a nice wiki.

Alternatively, I think that hawkeye does tell you the number of reads in a contig as well.

ADD COMMENTlink written 7.9 years ago by Michael Schubert6.9k

Thanks for your quick response. I do have AMOS installed and I can see the bnk in hawkeye and see the number of reads in each contig. But wanted to get that information programmatically. I followed the steps you had mentioned in your blog, but hit the same problem i.e. no {FRG in my .afg file.

I am trying to convert to ACE format and get the details, but 'amos2ace' takes too long. Wondering if there are other quick options.

ADD REPLYlink written 7.9 years ago by Abhi50

Can this be done with ABySS as well?

ADD REPLYlink written 7.9 years ago by Lythimus200

As long as ABySS is able to create a mapping file that AMOS can read (afg, ACE, etc.) it should.

ADD REPLYlink written 7.9 years ago by Michael Schubert6.9k
1
gravatar for Jeremy Leipzig
7.9 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

from http://www.cbcb.umd.edu/research/contig_representation.shtml

{CTG
iid:1
eid:1
seq:
CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}
}

we can parse this in perl easily enough without getting into all that bank stuff

open( ASM, "<velvet_asm.afg" );
while (<ASM>) {
    if (/\{RED/) { $reads++ }
    elsif (/{SCF/) {
      #i need to ignore TLEs in scaffolds b/c they are contigs
      $inScf=1;
    }
    elsif (/\{CTG/) { $inScf=0;$getCtgID = 1; }
    elsif (/iid:(\d+)/){if($getCtgID){$ctgID=$1;$getCtgID=0; }}
    elsif (/\{TLE/) { if(!$inScf){$inTile = 1; $tiles++;}}
    elsif (/src/) {
        if ($inTile) { 
          $src++;
          if($readSrc{$_}){
            warn "seen $_ before"
          }
          $readSrc{$_} = $ctgID;
          $inTile = 0; 
        }
    }
}
foreach my $read(keys %readSrc){
  print "read $read is in contig ".$readSrc{$read}."\n";
}
print "looks like I used ".scalar(keys %readSrc)." out of $reads total reads\n";

it should be pretty obvious how you can flip that hash to something more contig-centric

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Jeremy Leipzig18k

Thanks Jeremy.

It gives a different result than AMOS bank. For my example: your code counted 245572 reads in contigs, whereas hawkeye shows 252454 reads in contigs.

ADD REPLYlink written 7.9 years ago by Abhi50

hmmm well a read can have kmers in two or more contigs. i wonder if hawkeye is double counting those? well if you find the culprit lemme know.

ADD REPLYlink written 7.9 years ago by Jeremy Leipzig18k

what do you see when you type: bank2contig -e -S mybankname.bnk | cut -f3 | sort | uniq | wc -l

ADD REPLYlink written 7.9 years ago by Jeremy Leipzig18k

It shows 245568

ADD REPLYlink written 7.9 years ago by Abhi50
1
gravatar for Abhi
7.9 years ago by
Abhi50
Abhi50 wrote:

Used PERL AmosLib API to parse velvet_asm.afg file to get the count of reads in each contig.

use AMOS::AmosLib

my $fileName = "velvet_asm.afg";

open(fileHandle, $fileName);

my $totalReads = 0;

my $totalContigs = 0;

while (my $record = getRecord(\*fileHandle)){

        my ($rec, $fields, $recs) = parseRecord($record);

        if($rec eq "CTG"){

                $totalContigs++;

                my $nReads = 0;

                for(my $r = 0; $r <= $#$recs; $r++){

                        my ($srec, $sfields, $srecs) = parseRecord($$recs[$r]);

                        if($srec eq "TLE"){

                                $nReads++;

                                $totalReads++;

                        }

                }

                print "Number of Reads in contig#", $$fields{iid}, " are ", $nReads, "\n";

        }

}

print "Total # of Contigs :", $totalContigs, "\n";

print "Total # of reads in contigs:", $totalReads, "\n";

close(fileHandle);
ADD COMMENTlink modified 7.9 years ago by Casey Bergman18k • written 7.9 years ago by Abhi50

looks good. it would be interesting if you counted distinct read srcs since i honestly don't know how reads get credited when kmers are shared between different contigs

ADD REPLYlink written 7.9 years ago by Jeremy Leipzig18k
0
gravatar for flacchy
5.8 years ago by
flacchy40
flacchy40 wrote:

Hi,

I am trying to figure out how many reads made it into each contig after I run velvet. I used the flag -read_trkg yes and -amos_file yes. I found some advise online on how to process them but I am a bit stuck. So from the amos file I have tried this path:

$ /home/admin/amos-3.1.0/src/Bank/bank-transact -m velvet_asm.afg -b velvet_asm.bnk -c

and then $/home/admin/amos-3.1.0/src/Validation/analyze-read-depth velvet_asm.bnk -i -d -r > output

The problem is that the file output is something like this : 1 1646 1340 1340 108.275 108.275 2 1 97 97 1.03093 1.03093 3 2085 1589 1589 113.904 113.904 4 175 75 75 219.547 219.547 5 187 162 162 105.944 105.944 6 124 76 76 157.171 157.171 7 1023 768 768 112.914 112.914 8 9 99 99 9.09091 9.09091 9 151 168 168 72.8214 72.8214 10 2224 1602 1602 123.306 123.306 11 6582 5312 5312 107.116 107.116 12 2850 2487 2487 98.653 98.653 13 3 105 105 2.62857 2.62857 14 3 103 103 2.91262 2.91262 15 4350 3350 3350 115.28 115.28 16 1788 1268 1268 128.341 128.341 17 1746 1325 1325 116.024 116.024

could anyone tell me what the filed are or if there is a different way to get the information I need??

Thank you,

F.

ADD COMMENTlink written 5.8 years ago by flacchy40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2495 users visited in the last hour