Question

Blastn on paired end barcoding sequences

0

Entering edit mode

4.3 years ago

robert.murphy ▴ 80

I have some paired end barcoding ITS sequences from fungal isolates that I need to Blastn to confirm the identity of an isolate. I thus need so align two paired sequences first right? If this is the case is bwa mem a good tool to use, where the fwd is the reference and the reverse is being mapped onto it?

sequencing genome • 997 views

ADD COMMENT • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

If you want to do what you are trying to explain you could use one of the following tools to merge the reads:

FLASH (https://ccb.jhu.edu/software/FLASH/)
bbmerge.sh
PEAR
USEARCH
VSEARCH

But the reads need to overlap of course.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

So bwa mem is a bad tool to use?

If they are paired end reads of a barcode sequence they will always overlap no?.....at least in theory, unless something went very wrong?

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

No bwa mem is a great tool but not the right tool for your goal. And if there is overlap depends on which primers you use and the kind of barcode. ITS for example can be to long and have no overlap. The basics of this kind of analyses is:

Quality trim/filtering
primer trimming
merging
clustering
Identification (blast for example)
Interpreting you result

EDIT:

removed a step

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

Why bother with clustering and OTU when you can just blastn? I could understand it if was a metagenome but this is just ITS barcoding of fungal isolates (sorry I should have mentioned what is was earlier). How would I know when to and when not to merge the paired end reads?

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

What if there is contamination, are you planning to pick some random reads and blast them? What if you pick the contaminated read. Blasting all reads would take forever (depending on the database). In your case know I know better what you want I would do first.

Merge reads with FLASH
primer trim with cutadapt
Use a dereplication from usearch or vsearch (https://drive5.com/usearch/manual/cmd_fastx_uniques.html)
check abundance in the output from step 3, you expect one read to have an extremely high abundance compared to the rest.
blast the most high abundant reads (what should be only one or 2 I would expect)

Step 1 will output a log, in that log you can see how much reads are merged. In you case because you have an isolate. If the percentage is high you can merge, if it is really low you can or not merge and only need to use the forward reads.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

what do you mean pick reads at random? When you send for barciding all you get in return is just a .fasta of the specific sequence your primers target (in this case some ITS regions) no? I feel like I am not understanding something as I dont see why one could not just merge (if needed), quallity trim (if needed as some return already trimmed) then just blastn against the NT database? Apologies if I am missunderstanding something.

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

You barely give information to work with. If you want specific answers you need to give specific info. So if I understand now you only have 1 fasta file? Did you perhaps do Sanger sequencing?

you get in return is just a .fasta of the specific sequence your primers target

If you have one fasta file and you did sanger sequencing I guess most prepossessing is already done and you can just blast. How many reads do you have? Did you even try to just blast already?

When you send for barciding all you get in return is just a .fasta of the specific sequence your primers target (in this case some ITS regions) no?

Is this a question? It is your data you know what you have. And no, you can get FASTQ files or ab1 files. If it is a fasta there have been already some analyses/edits done.

Why don't you just try to merge, trim primers and blast? We don't know what kind of data you have and I sounds like you also don't know it.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

So I isolated some fungal materia of the plate and sent for ITS barcode sequencing which is done in house at the university via illumina technology. In return I get paired end fasta reads for each sample sent off that look like the following (with the accompanying reverse read also):

> >EF30159600_EF30159600 AGGAATCTAGGGGCATGTGCACGCCTGCCATCGTTTTCAACCACCTGTGCACCTTTTGTAGACTTGGATACCGTTCGAGG
> GTTAACCTCGGTTTTGAGGACTGCTGTGCTGTACGAGTCAGCTTTCCTTACATTTCCGGTCTATGCCTTTACATATACCC
> CGTAACGAATGTATTAGAATGTTTTGTTATTGGCCTCAGTGCCTTTAATCAAATACAACTTTCAGCAACGGATCTCTTGG
> CTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAA
> CGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACCTGACCAGCTTTTTGTG
> AGCTTTGGCTTAGGCTTGGATGTGGGGGGTTGCGGGCTTCACAGAAGTCGGCTCTCCTTAAATGCATTAGTGGAACCTTT
> TGTTGACCTGTTCCTGGTGTGATAATTATCTACACCGCGGGCGGTTAGCAGCTCATTTTTAAATGGGGGCTTCGCTTCTA
> ACTTGTCCTTACCCGGACACTTTGACCATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCA

I need to confirm the identiy of the isolate with this ITS barcode which I assumed would be best to do via merging the reads then blasting against the blastn NT database to see what hits I get? Sorry for not providing enough info in the first place.

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

If you really only got back fasta files there is already something done with it. Normally you get fastq files. You need to contact your university for this. I don't know if a merge tool for fasta files exist. Most tools need the quality scores to determine which base end up in the output. You can also not quality trim a fasta file if that was your plan.

If you really only have this data try to remove the primers. Optionally dereplicate and just blastn. Only forward or separated.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

If you use the sequence posted above it brings back hits to Termitomyces spp. If that level of identity confirmation is acceptable then you are on the right track.

You may wish to search against a specialized ITS sequence database like UNITE instead.

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

To add to this you can also try http://www.boldsystems.org (also gives Termitomyces sp.)

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

No strictly speaking you don't need to merge the sequences (unless this is 16S sequencing and the reads are designed in a way that they will merge).

Blastn to confirm the identity of an isolate

How were you planning to do this? Just by looking at the quality of alignments or looking for marker genes?

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Well blasting the barcode sequences will return a hit in blastn, and that will tell you the species the barcode sequence belongs to? Why not strictly speaking, it is always better to align paired end reads no? you just lose data you paied for if you dont? Edited question to specify data type.

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80