How to remove technical read including known sequnce
1
0
Entering edit mode
8.8 years ago
cfarmeri ▴ 210

Hi, I have a problem during QC and would like you to give me a hint.

I studied how to remove adapter sequence from 5' end and 3' end,

But I don't know the way to get rid of technical reads (not adapter) that contain known sequence.

For example, there following read in raw fastq file.

@id_01 length=62
GACTACGTACA**GAACAGATAATGACCATTTATAC**CGGAACAAATGGTTATCTGGATGGATTA
+id_01 length=62
IIIIIIIIIICCCFFFFFHHHHHJJJJJJJJJ<HHIJJJJJJJJJJFHIJJJJJIJJJJJJJ

The GAACAGATAATGACCATTTATAC sequence is generated from vector plasmid.

So I want to get rid of this read from all read before high-dimensional analysis.

Anyone has solution?

I'm sorry that I'm beginner in bioinformatics and my clumsy English.Thanks.

sequencing • 1.3k views
ADD COMMENT
1
Entering edit mode
8.8 years ago

You can use BBDuk for that:

bbduk.sh in=reads.fq out=clean.fq literal=GAACAGATAATGACCATTTATAC k=23

You can add the flag hdist=1 if you want to allow 1 substitution.

ADD COMMENT

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6