How to remove duplicate reads / create consensus reads based on umi tags in 2nd fastq file?
0
0
Entering edit mode
4.3 years ago
William ★ 5.3k

In a targeted sequencing dataset, I have for each sample a r1.fastq.gz file and a umi.fastq.gz file.

How can I use the short umi tags that are in the umi.fastq.gz file, to remove duplicates or create a consensus sequence of the reads in the r1.fastq.gz file? Or do I need to go back to, or recreate the original fastq files?

The r1.fastq.gz file and the umi.fastq.gz file have the same number of reads. I expect also that the order of the reads and the umi tags in both files is in sync.

I found fastp, but it is not clear to me which option I should use, or that I need a different tool/script for this setup of the data. https://github.com/OpenGene/fastp#unique-molecular-identifier-umi-processing

The files look like this (actual sequence masked XXXXX characters)

$ zcat sample_1.R1.fastq.gz | head
@HVVFKAFXY:1:11101:10004:17048
AATXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

+
AA/A/EA/EEAA/E//EA/EEEEEE/A/EE</<//EEEEEE/EE//EAEEA///66E//EE/AEE/EEAEEE/AA
@HVVFKAFXY:1:11101:10009:11939
CTAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE6EEEEEEEEEEEEAE/EEEEEEEEEEAE<E<EEEE
@HVVFKAFXY:1:11101:10010:19964
TTCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

$ zcat sample_1.umi.fastq.gz | head
@HVVFKAFXY:1:11101:10004:17048
GTAGGGACACTT
+
AAA///EE//EE
@HVVFKAFXY:1:11101:10009:11939
TGCTGCATTTTC
+
AAAAAEEEEEEE
@HVVFKAFXY:1:11101:10010:19964
CTAATCTAGTAA
umi • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you show a read example (or two) from each of these files? zcat file.gz | head -8.

You may be able to use umi-tools but will likely need to align the data (https://github.com/CGATOxford/UMI-tools/blob/master/doc/QUICK_START.md ).

ADD REPLY
0
Entering edit mode

I use umi-tools, but you'll have put the umi in the read name. Bcl2fastq can likely do this for you if you have access to the bcl files.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6