isolate adapter contamination reads from fastq file using python
0
0
Entering edit mode
2.7 years ago
vaishnavi ▴ 80

Hi everyone,

I want to extract adapter contaminated reads from a fastq file using python code, but I am unable to do so.

Adapter sequence is: "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA"

File contains this data:

@HWUSI-EAS570R_0003:2:50:5038:17424#0/1
CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
+HWUSI-EAS570R_0003:2:50:5038:17424#0/1
hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
@HWUSI-EAS570R_0003:2:50:5175:17417#0/1
CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
+HWUSI-EAS570R_0003:2:50:5175:17417#0/1
hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
@HWUSI-EAS570R_0003:2:50:5442:17417#0/1
AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
+HWUSI-EAS570R_0003:2:50:5442:17417#0/1
ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
@HWUSI-EAS570R_0003:2:50:5552:17421#0/1
AAGACATCAAACTACGAAACTACTACAAGAAAACAT
+HWUSI-EAS570R_0003:2:50:5552:17421#0/1
hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
@HWUSI-EAS570R_0003:2:50:5658:17415#0/1
GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
+HWUSI-EAS570R_0003:2:50:5658:17415#0/1
hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
@HWUSI-EAS570R_0003:2:50:5712:17421#0/1
TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
+HWUSI-EAS570R_0003:2:50:5712:17421#0/1
hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh

This is the code tried:

import re
with open('last_mock.fastq','r') as rf:
    for line in rf:
        x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
        if x:
             print(x)
python genomics regex • 1.1k views
ADD COMMENT
1
Entering edit mode
  1. Please note that generic code usage for parsing specially formatted files is not advisable.
  2. Try specific libraries in biopython.
  3. Unless this is assignment, you can do it with established tools like cutadapt or seqkit.

Btw, sequences (from reads, in OP) are same length as adapter and none of them contain adapter (only 3 nts match with 7 sequences).

ADD REPLY
0
Entering edit mode

thanks for your reply @cpad0112 , I know how to do it in cutadapt but my professor strictly told us to write a code in python or perl.

ADD REPLY
0
Entering edit mode

also can you suggest me any python library.

ADD REPLY
2
Entering edit mode

Install biopython and use seqIO and SeqRecord classes

ADD REPLY

Login before adding your answer.

Traffic: 2826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6