counting reads in a fastq file using a refrence fasta file
1
0
Entering edit mode
21 months ago
vaslanzadeh ▴ 20

Hi

I have a reference fasta file containing 130,000 unique sequences (barcodes), each 30nts long. These sequences were synthesized with random incorporation of nucleotides in each position, so they have very large hamming distance. I had a pool of cells each carrying a single barcode, these cells were sorted by FACS, and the genomic DNA extracted. Next, barcode regions were amplified by PCR and sent for 50bp single end illumina sequencing. What I want to do now is to count number of times that each of those 130,000 barcodes present in the fastq file. As the hamming distance is large, I would like to treat sequence with 2 or 3 nucleotides difference as the same sequence. What is the best way to do this? Currently I am testing bwa-mem, but wondering if there are better approaches for this task.

Thanks

fastq alignment readCount • 825 views
ADD COMMENT
1
Entering edit mode
21 months ago
GenoMax 146k

This is similar to tag counting for CRISPRi. Use: https://github.com/veeninglab/2FAST2Q I don't know about allowing for 2+ errors but it will allow one for sure.

ADD COMMENT
0
Entering edit mode

Thanks, this works but it is extremely slow, is there any other alternatives?

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6