KRAKEN2 Sequence
1
0
Entering edit mode
2.1 years ago
chimerajit • 0

I ran KRAKEN2 using Viral database and a TranscriptAssembly assembled by rnaSpades. result is O.K. I found some targeted viruses.
Now I want those sequence which KRAKEN2 identified under those taxonimic group.

my KRAKEN2 output looks like this

C       NODE_24_length_27434_cov_34671.005410_g0_i23    1147722 27434   0:14927 196894:5 1147722:1 0:12164 28883:2 2545435:2 0:10 754059:4 0:285

I Understand \

  1. 1st column indicating Classified or Unclassified \ 2 then the Header of fasta \ 3 The Taxonomic ID \ 4 Length of that Fasta \
  2. k-mers match \

Now I used the taxon ID information to find out the fasta headers and then I fetch out those Sequences from Assembly. However, If I use these sequences to do nBLAST it is not showing any similar result.

further, I understand that Kraken not used full sequence to identify that reported organism(k-mers). Then how to I get those identified sequences?

BLAST KRAKEN2 MEtagenome • 555 views
ADD COMMENT
0
Entering edit mode
2.1 years ago
chimerajit • 0

Well Got one way to do it

  1. 1st take out those selected taxa id and related fasta header from KRAKEN2.Kraken output file\

  2. Use https://github.com/santiagosnchez/faSomeRecords/blob/master/faSomeRecords.pl to extract those Sequences from your KRAKEN input files.(my case it is a fasta assembly)\

  3. use that specific taxon ID to locate the genome from NCBI or similar Database\

  4. Do a Nr Blast with Ref_Seq and the Taxa-Specific Sequence. use outfmt 6 So you will get Start End info of specific hit\

  5. make a bed file using BLAST output\

  6. use bedtools getfasta option with the bed file and your Kraken extracted Seqence file. You will get exact stretch of sequence. \

Let me know if any easy way around.

ADD COMMENT

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6