Question: Is there a way to segregate Plasmid contigs from Iontorrent WGS data?
0
gravatar for Optimist
13 months ago by
Optimist90
India
Optimist90 wrote:

I have a fastQ and fasta file of a klebsiella pneumoniae genome and would like to separate Plasmid contigs from WGS fasta file with 221 contigs.

Separate plasmid contigs would help me characterize plasmids and resistant genes it possess using a circular representation and further enable me carry out downstream specialized analysis.

Can someone suggest me a tool or pipeline in this regard?

Thanks & Regards

Optimist

ADD COMMENTlink modified 13 months ago by 5heikki8.7k • written 13 months ago by Optimist90
2
gravatar for 5heikki
13 months ago by
5heikki8.7k
Finland
5heikki8.7k wrote:

There might be some plasmid specific binning program. Even metagenome binning programs may be able to achieve what you want, e.g. MaxBin. However, since complete reference genomes exist for Klebsiella pneumoniae, perhaps the easiest way is to download such genome, e.g. this one. It has 7 chromosomes:

>NC_016845.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 chromosome, complete genome
>NC_016838.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS1, complete sequence
>NC_016846.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS2, complete sequence
>NC_016839.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS3, complete sequence
>NC_016840.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS4, complete sequence
>NC_016847.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS5, complete sequence
>NC_016841.1 Klebsiella pneumoniae subsp. pneumoniae HS11286 plasmid pKPHS6, complete sequence

Extract the chromosome sequence from the fasta file into another fasta file. Then blast your contigs against the new fasta file. All the contigs that produce long alignments will clearly represent non-plasmid DNA.

ADD COMMENTlink written 13 months ago by 5heikki8.7k

Thank you for your answer.

I need to segregate Chromosomal DNA contigs & Plasmid DNA contigs separately from a fasta file exactly like that of example ref file HS11286 you have shown.

Can you please elaborate further on the solution you've given

I have downloaded Klebsiella pneumoniae reference genome file from NCBI.

Thanks & regards

ADD REPLYlink written 13 months ago by Optimist90
1

Well, you don't really even need to extract any sequences from the file I linked. Just blast your contigs against it:

blastn -query yourContigs.fa -subject theRefFile.fa -outfmt 6 > blastResult.txt

As the output file will not be that big, you can even open it in excel (tabs separate fields). Studying just the first four columns ought to take you far (query, subject, percent identity and alignment length). However, if you want to be efficient and have a bash shell at hand, this outputs only the best hits for each of your contigs:

export LANG=C LC_ALL=C; sort -t $'\t' -k1,1 -k12,12gr -k11,11g blastResult.txt | sort -t $'\t' -uk1,1 > blastResultBestHits

Then to show only the contigs where the best hit was against the reference genome chromosome:

awk 'BEGIN{FS="\t"}{if($2~/NC_016845/){print $1}}' blastResultBestHits

And contigs where the best hit was against something other than the reference genome chromosome:

awk 'BEGIN{FS="\t"}{if($2!~/NC_016845/){print $1}}' blastResultBestHits

Search "how to extract sequences from fasta based on header" to find out how to then extract whatever from your assembly fasta file..

ADD REPLYlink modified 13 months ago • written 13 months ago by 5heikki8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1834 users visited in the last hour