Generate list of numbers corresponding to fq reads based on 6-character barcode
0
0
Entering edit mode
4.4 years ago
johnsonn573 ▴ 10

I have a 24-line barcodes.txt file, each line of which is a 6-letter barcode.

I have ~30 million reads in my fastq file (big.fastq). The first 6 characters of every read is 1 of the 24 barcodes in barcodes.txt.

I would like to generate 24 new txt files. I want the lines of the new txt files to correspond to the read numbers that begin with that barcode. For example, if the first barcode is AACAGA, I would like the first new txt file to be the numbers of all the reads in big.fastq with the barcode AACAGA.

fastq barcode • 1.0k views
ADD COMMENT
0
Entering edit mode

What have you tried? The following example isn't the most efficient way, but will get you the count for each first 6 characters in the file. You can use awk or sed to output every nth line in your fastq that contains the reads.

[~/Data/scratch/tmp/biostar]$ cat test.txt 
GATCGATCGATCG
GATCGATCGATCGA
GCTAGCTAGCTAG
GAGAGAGCTAGA
GAGAGAGCTCGATCGAT
GATCGATCGATCGA
[~/Data/scratch/tmp/biostar]$ cut -c -6 test.txt | sort | uniq -c
   2 GAGAGA
   3 GATCGA
   1 GCTAGC
ADD REPLY

Login before adding your answer.

Traffic: 2385 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6