Question

counting reads in a fastq file using a refrence fasta file

0

Entering edit mode

15 months ago

vaslanzadeh ▴ 20

Hi

I have a reference fasta file containing 130,000 unique sequences (barcodes), each 30nts long. These sequences were synthesized with random incorporation of nucleotides in each position, so they have very large hamming distance. I had a pool of cells each carrying a single barcode, these cells were sorted by FACS, and the genomic DNA extracted. Next, barcode regions were amplified by PCR and sent for 50bp single end illumina sequencing. What I want to do now is to count number of times that each of those 130,000 barcodes present in the fastq file. As the hamming distance is large, I would like to treat sequence with 2 or 3 nucleotides difference as the same sequence. What is the best way to do this? Currently I am testing bwa-mem, but wondering if there are better approaches for this task.

Thanks

fastq alignment readCount • 710 views

ADD COMMENT • link 14 months ago by vaslanzadeh ▴ 20

score 1 · Answer 1 · 2023-01-20

1

Entering edit mode

15 months ago

GenoMax 141k

This is similar to tag counting for CRISPRi. Use: https://github.com/veeninglab/2FAST2Q I don't know about allowing for 2+ errors but it will allow one for sure.