Entering edit mode
9 weeks ago
Apex92 ▴ 260
I have a condition where I need to use all sequencing reads (a concatenated fasta file) as a reference. The concatenated fasta file has 50,829,402 reads. I tried to use bowtie 1 to build the index of the reference as
bowtie-build concatenated.fasta ref but I get the following error.
Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently.
How can I solve this? Instead of running bowtie-build on a concatenated fasta file can I use
bowtie-build -f *.fasta ref?
Preferably I need to use bowtie 1.
May I ask why you do that? I cannot imagine any situation where concatenating reads (which is randomly fragmentated DNA after all) would make any sense.
These are sequenced PEA products that I need to check with the expected PEA sequences. I tried to map reads which are long (due to UMI and primers) to the expected PEA sequences (shorter) allowing zero mismatches but I got no alignment that makes sense. Thus I decided to do it the other way around (to map the expected PEA sequences to the sequenced PEA products).
What is PEA? I think it would make sense to include a layout and brief description (technically) of what you did and how the R1/R2 structure and reference looks like and then one can suggest an alignment strategy. This so far sounds quite non-standard.
Perhaps the kit includes access to data analysis software? https://olink.com/our-platform/our-pea-technology/data-generation-and-qc/
It is not clear if it is possible to analyze the data using standard software.
Yes, use their software if posssible or contact customer support. This is no standard assay it seems.