Normalizing BLAST results
0
0
Entering edit mode
6.7 years ago
db • 0

I have run blast on assembled contigs (results of MEGAHIT) and obtained the output for 5 different samples. But I am not sure how I should normalize the output for comparing the five samples? Should I normalize it using the number of reads in the original raw sequences?

To provide more information:

  • I am looking to compare counts of antibiotic resistance genes between samples
  • The raw reads are from HiSeq (2x125bp PE) sequencinf of DNA from environmental samples
  • I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query


EDIT: I can't add more replies anymore so I will edit the question itself to reply.

blast • 2.4k views
ADD COMMENT
0
Entering edit mode

What do you mean by "normalize" BLAST results? Are you looking to get a non-redundant result set?

ADD REPLY
0
Entering edit mode

I am looking to compare gene counts between samples.

EDIT: I can't add more replies anymore so I will edit the question itself to reply.

ADD REPLY
0
Entering edit mode

Is it NGS sequencing ?

ADD REPLY
0
Entering edit mode

Yes, the raw reads are from HiSeq (2x125bp PE)

ADD REPLY
1
Entering edit mode

Where does BLAST fit in? Did you use that to align the data?

ADD REPLY
0
Entering edit mode

I agree you should use bwa :)

ADD REPLY
0
Entering edit mode

I first used MEGAHIT to assemble the raw reads and then used BLAST with assembled contigs as the query.

ADD REPLY
0
Entering edit mode

If you are interested in counts, it may be best to align the data using an NGS aligner and then use featureCounts along with a GFF file to do the counting.

Otherwise I am not sure how you are going to get counts from BLAST results (which are likely in form of HSP). Did you collect the results in tabular format?

Edit: This appears to be a metagenomics experiment (since MEGAHIT was used). Are you wanting to count at the level of organism/species/genes and counts of what?

ADD REPLY
0
Entering edit mode

count antibiotic resistance genes

ADD REPLY
0
Entering edit mode

What format did you collect your BLAST results in and what database did you blast against?

ADD REPLY
0
Entering edit mode

BLAST results are in .csv format (-outfmt 10). I blasted against database I created with makeblastdb using one of the fasta files from CARD (https://card.mcmaster.ca/download)

ADD REPLY
0
Entering edit mode

A bit late to the party. But out of curiosity, how did you normalize the results? I also created a blastdb from CARD and I am now busy with comparing the results.

ADD REPLY
0
Entering edit mode

Is it an RNAseq experiment or just genome sequencing?

ADD REPLY
0
Entering edit mode

It is metagenomic sequencing (DNA from environmental samples).

ADD REPLY
0
Entering edit mode

yes then you can align your reads against your contigs if you want to describe you population

ADD REPLY
0
Entering edit mode

What tool would you recommend for good speed?

ADD REPLY
0
Entering edit mode

you can use bwa , but you have to be care if you have close species/genes a same read can be aligned against multiple contigs

ADD REPLY

Login before adding your answer.

Traffic: 2608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6