Find_primer_sequence_and_reverse_complement_read_within_amplicon.fastq_file script needed
1
0
Entering edit mode
6.3 years ago

Hi,

I have a fastq file with merged amplicon sequence reads post Illumina sequencing. However, the library was build by ligating the Illumina adapters to the amplicon. Therefore, the merged amplicon sequences have no common direction. Now I had the idea to use a script that finds the exact sequence of a certain primer and reverse complements those sequences within the fastq file. One _is_ able to do this in Geneious, however, I'd rather like to use a script/tool to incorporate into the pipeline I want to establish. I already looked into BBtools and others, however, couldn't find a tool doing this.

Thank you for any input.

Cheers,

Karsten

next-gen sequencing • 1.8k views
ADD COMMENT
0
Entering edit mode

Hello karsten.liere!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=80003

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Hello Pierre,

Oh - I'm sorry about that. I had no idea that the two communities are as closely knit as they are. I'll keep it in mind. Thank you for the heads-up.

Cheers,

Karsten

ADD REPLY
3
Entering edit mode
6.3 years ago
GenoMax 141k

exact sequence of a certain primer and reverse complements those sequences within the fastq file

Just that part of the sequence or the whole sequence? If whole sequence is fine then you should be able to select matching reads that contain the sequence you are interested in using bbduk.sh and then you can reverse-complement them using reformat.sh.

ADD COMMENT
0
Entering edit mode

Hi genomax,

ah, thank you for your idea. Got me going ;-). Indeed I'd like to rc the entire read. So here's the "workflow":

bbduk.sh in=in.fastq literal=ACACGACTACNVGGGTATCTAAKCC out=to_rc.fastq outm=not_to_rc.fastq
reformat.sh in=to_rc.fastq rcomp=f out=rced.fastq
cat not_to_rc.fastq, rced.fastq > in_trimmed_aligned.fastq

That's what you thought of, right?

However, the literal argument doesn't find the primer sequence. Any idea about that?

Thank you for your input!

Cheers,

Karsten

ADD REPLY
1
Entering edit mode

I think literal option uses exact sequence. You may need to make a file with all possible options for the sequences and provide that with ref=. Or you could look for just the invariant part at the beginning literal=ACACGACTAC and adjust k= accordingly.

For reformat.sh step, you need to use rcomp=t, if you want to reverse complement the reads.

ADD REPLY
0
Entering edit mode

I now used

bbduk.sh in=in.fastq literal=ACACGACTAC k=10 out=not_to_rc.fastq outm=to_rc.fastq
reformat.sh in=to_rc.fastq rcomp=f out=rced.fastq
cat not_to_rc.fastq rced.fastq > in_trimmed_aligned.fastq

and it worked nicely, thank you so much.

PS: Why did you suggest using rcomp=t instead of rcomp=f ?

ADD REPLY
0
Entering edit mode

I thought you wanted to reverse-complement those reads which contained the adapter and that was why I suggested setting rcomp to true. If the results look as what you expect them to be then you can go with what you have. I will move my original comment to an answer. You can accept it to provide closure to this thread.

ADD REPLY
0
Entering edit mode

I see - yes indeed. So there is a way to rc the reads within the original fastq file without writing the out=not_to_rc.fastq outm=to_rc.fastq files to disk as input files for reformat.sh? Right now bbduk.sh writes those two files to disk and reformat.sh takes the to_rc.fastq file, rcs it, and saves the resulting file again. That's why I use the cat command to concatenate the not_to_rc.fastq file from the bbduk.sh step and the rced.fastq file from the reformat.sh step into a single file.

Thank you for your patience,

k.

PS: And yes, question's solved - thanks again!

ADD REPLY
1
Entering edit mode

Answer to your question is you should never mess with original data files.

See if following does what you need without having to write at least one intermediate file:

bbduk.sh in=in.fastq literal=ACACGACTAC k=10 out=not_to_rc.fastq outm=stdout.fastq | reformat.sh in=stdin.fastq rcomp=f out=stdout.fastq | cat not_to_rc.fastq - > in_trimmed_aligned.fastq

If this is working as expected then add a final pipe | rm -f not_to_rc.fastq to the command above to leave no intermediate files.

bbduk.sh in=in.fastq literal=ACACGACTAC k=10 out=not_to_rc.fastq outm=stdout.fastq | reformat.sh in=stdin.fastq rcomp=f out=stdout.fastq | cat not_to_rc.fastq - > in_trimmed_aligned.fastq | rm -f not_to_rc.fastq
ADD REPLY
0
Entering edit mode

Thank you!

And yes - I _am_ stupid - f means obviously false not file. Oh dear. Sometimes....

So far I'm not really pleased with the bbduk side of this. It yet misses to many sequences. I'll play a bit around with it.

Cheers,

k.

ADD REPLY
0
Entering edit mode

As long as you get the settings right bbduk.sh will definitely work. You may want to reduce that value of k= since contaminants smaller than 10 will not be found in your current command. Value can be as small as 1 but you have to be careful. Failing that you can always brute-force all possible combinations of sequence you expect.

ADD REPLY

Login before adding your answer.

Traffic: 2525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6