Renaming reads
1
0
Entering edit mode
23 months ago
vvs.hazia ▴ 10

The chalange is to append read names with UMIs, that stored in a separate file.

The sceme of files with reads:

@name 2:N:0:indx #or 1:N:0
READ
+
QUALITY

The sceme of UMI-files:

@name 2:N:0:indx
UMI
+
QLT

And what I ecpect to get:

@name_UMI 2:N:0:indx #or 1:N:0
READ
+
QULT

I wrote the code and it worked on small data. But The code on the sample stuked and did not processe any first reads. Can you suggest what whent wrong?

The code:

DIR="/rename"
i=1
s=2
len=$(wc -l R1.fastq)                                           
len=$(echo $len)
len=$(echo ${len//R1.fastq/})
c=1                                   
while [ ${c} -lt ${len} ]; do
    k=$( echo ${s}p)
    m=$( echo ${i}p)
#extracting the read name from UMI-file
    name=$(echo $(sed -n ${m} UMI.fastq | cut -d" " -f1))
#extracting UMI sequence from UMI-file
    umi=$(echo $(sed -n ${k} UMI.fastq))
#creating a variable with appended name
    a=$name"_"$umi
#substituting name in original read-file with appended name and rewriting the file to save changes for next cycles
    file=$(<$DIR/R1.fastq)                             
    echo "${file//$name/$a}" > $DIR/R1.fastq          
    i=$(($i+4))
    s=$(($s+4))
    c=$(($c+4))
done

The test input small file:

@NB501229:643:HVY7VAFX2:1:11101:20852:1041 2:N:0:TCCTGAGC
GTCTCGTGGTCTTTTCTCACATAAGCTACATGGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA
+
EEAEEEEEEEEEEEEAEEEEEEEEEEEEEAEAAEEE/EEEEEEEE<EEE/EAEEAEEEAEEEEEA/EAAEEEEEA//AEE<<EE/EEAEA<E<EEEEEEEEEEAE<AAEEEE<A<<EEAEEA6EAE<E
@NB501229:643:HVY7VAFX2:1:11101:12863:1042 2:N:0:TCCTGAGC
TGGATGCTCGTGGTGAAGAAGAATCAGCTTCCCCAGGATCAGCACCAGGCCTGGATGTTTGGACATTTCGGCATCATTGCCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEAEAE<AEEAEAEEEEEEEAE/A<AA
@NB501229:643:HVY7VAFX2:1:11101:18188:1042 2:N:0:TCCTGAGC
GATTCTGTTGCTGAAATGCTGTAACTGTAGTAATGTAAACCATTGTCTCCATGATCATGTTTCCTGTGTTGTAGATTATGTAACTGCATGGCTTACATGAGGGGTCCTCATGTAAGTGCAGCAAGTCT
+
AE<E<EEEEEEEEEE//EEEEEEEEEEEEEEE/EEEE//AEEEEEEAEE<<EEEEEEAEE/EE/EEEAEEEE<EAEAAEEEEEEEEEEEEA<<EEAAE<E<EAEEE/AAEEEEEEEAEEEEEEEAA<6
@NB501229:643:HVY7VAFX2:1:11101:9570:1042 2:N:0:TCCTGAGC
GGGCCTCCCGCGCACTGCTTGGCATATTAATTAAGAATATCCTCGCTGAGGCCTGACACTGTAGTCTGGGAACTATACTCCGAGTCGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTG
+
A//E/<//AEEEE/A/6/<///</<///////////////////////A//<EEAEA<AAEE/A<EEEEAEEEEEEE////<E/E6/6A/EE/A<EEE/AA<///A/AA<<A6<A/EE<AAA<A////
@NB501229:643:HVY7VAFX2:1:11101:20006:1042 2:N:0:TCCTGAGC
GAAGGGCCTGACCTCACCCTTGAGGACGTGCTATGGTGGCCCGCAGCGAGGGTCCCTGCCACCCAGCCATGGCCAGAGCACCTGCCACGTGCCAGGCACTGTCTGAGTCCTGAGTCTAGTCACGCGGG
+
E<EEEEAEEE/<A<EE///////E/EEEAEEA//A6EEE<AEEE<EEAE//E<EEEEEEE/EEEAEE/<EAEEEEEEEEE<AEEEEEEAAEA66AAE/<AEEEAAEEAA<AAEE/AAEE/</AEAA<<
@NB501229:643:HVY7VAFX2:1:11101:26099:1044 2:N:0:TCCTGAGC
GCCGAGTTGAAGCCCCGCTTCCTGTAGGACATCGTGATCGACGCCATTGGCGGTAGCAGGCCCCCTTGGCCGCCCCTGGAGTACCAGCCCTACCAGAGCATCTACGTCGGGGGCTTGATGGAAGGGGG
+
AAEEEEEEAE///A/EAAEE//EE//E/////A/EEEEAE/EE<E//<<AAEEEEEA/A/E//E/AAEEEEE<///EEE/AA/<A//6<<E//<E//AAEEA///<A<A/E/A/<AAE/EE//EEEE/
@NB501229:643:HVY7VAFX2:1:11101:5399:1044 2:N:0:TCCTGAGC
AGGACCAGCCCCCCCCCCCCCCGCCCCCACCGCGCCCACCCACCCAGGGGGCCCGGCCAAACGCGCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGCCCCGGGGGGGGGGGGAACGGGCCCCCG
+
/////////E/EEAEA//6EEE///6AE///////AE//////A//////E//E///////////////EE<E///</<EA/A<///<EAAAA//<////6//////////6//////A/////AE//

Test output file:

@NB501229:643:HVY7VAFX2:1:11101:20852:1041_GAATCGGGACGA 2:N:0:TCCTGAGC
GTCTCGTGGTCTTTTCTCACATAAGCTACATGGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA
+
EEAEEEEEEEEEEEEAEEEEEEEEEEEEEAEAAEEE/EEEEEEEE<EEE/EAEEAEEEAEEEEEA/EAAEEEEEA//AEE<<EE/EEAEA<E<EEEEEEEEEEAE<AAEEEE<A<<EEAEEA6EAE<E
@NB501229:643:HVY7VAFX2:1:11101:12863:1042_ACGTCCGAGGAG 2:N:0:TCCTGAGC
TGGATGCTCGTGGTGAAGAAGAATCAGCTTCCCCAGGATCAGCACCAGGCCTGGATGTTTGGACATTTCGGCATCATTGCCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEAEAE<AEEAEAEEEEEEEAE/A<AA
@NB501229:643:HVY7VAFX2:1:11101:18188:1042_ATTTCTAGTTCC 2:N:0:TCCTGAGC
GATTCTGTTGCTGAAATGCTGTAACTGTAGTAATGTAAACCATTGTCTCCATGATCATGTTTCCTGTGTTGTAGATTATGTAACTGCATGGCTTACATGAGGGGTCCTCATGTAAGTGCAGCAAGTCT
+
AE<E<EEEEEEEEEE//EEEEEEEEEEEEEEE/EEEE//AEEEEEEAEE<<EEEEEEAEE/EE/EEEAEEEE<EAEAAEEEEEEEEEEEEA<<EEAAE<E<EAEEE/AAEEEEEEEAEEEEEEEAA<6
@NB501229:643:HVY7VAFX2:1:11101:9570:1042_ATAGCCGCGAAA 2:N:0:TCCTGAGC
GGGCCTCCCGCGCACTGCTTGGCATATTAATTAAGAATATCCTCGCTGAGGCCTGACACTGTAGTCTGGGAACTATACTCCGAGTCGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTG
+
A//E/<//AEEEE/A/6/<///</<///////////////////////A//<EEAEA<AAEE/A<EEEEAEEEEEEE////<E/E6/6A/EE/A<EEE/AA<///A/AA<<A6<A/EE<AAA<A////
@NB501229:643:HVY7VAFX2:1:11101:20006:1042_AGGGGGTATTAC 2:N:0:TCCTGAGC
GAAGGGCCTGACCTCACCCTTGAGGACGTGCTATGGTGGCCCGCAGCGAGGGTCCCTGCCACCCAGCCATGGCCAGAGCACCTGCCACGTGCCAGGCACTGTCTGAGTCCTGAGTCTAGTCACGCGGG
+
E<EEEEAEEE/<A<EE///////E/EEEAEEA//A6EEE<AEEE<EEAE//E<EEEEEEE/EEEAEE/<EAEEEEEEEEE<AEEEEEEAAEA66AAE/<AEEEAAEEAA<AAEE/AAEE/</AEAA<<
@NB501229:643:HVY7VAFX2:1:11101:26099:1044_CGTTTCGGGGTA 2:N:0:TCCTGAGC
GCCGAGTTGAAGCCCCGCTTCCTGTAGGACATCGTGATCGACGCCATTGGCGGTAGCAGGCCCCCTTGGCCGCCCCTGGAGTACCAGCCCTACCAGAGCATCTACGTCGGGGGCTTGATGGAAGGGGG
+
AAEEEEEEAE///A/EAAEE//EE//E/////A/EEEEAE/EE<E//<<AAEEEEEA/A/E//E/AAEEEEE<///EEE/AA/<A//6<<E//<E//AAEEA///<A<A/E/A/<AAE/EE//EEEE/
@NB501229:643:HVY7VAFX2:1:11101:5399:1044_GTTAACGCGTAT 2:N:0:TCCTGAGC
AGGACCAGCCCCCCCCCCCCCCGCCCCCACCGCGCCCACCCACCCAGGGGGCCCGGCCAAACGCGCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGCCCCGGGGGGGGGGGGAACGGGCCCCCG
+
/////////E/EEAEA//6EEE///6AE///////AE//////A//////E//E///////////////EE<E///</<EA/A<///<EAAAA//<////6//////////6//////A/////AE//
fastq rename sed UMI • 815 views
ADD COMMENT
0
Entering edit mode

seqkit rename with external file and defined kv will help

ADD REPLY
0
Entering edit mode

Thank you, I learned about new softwar

ADD REPLY
4
Entering edit mode
23 months ago
GenoMax 141k

While it is commendable that you made an effort to write your own code, as you discovered it may not be possible to make sure it will work reliably (especially if you have any edge cases). It would be safe to use a tool meant to do this.

I suggest using umi-tools as indicated in this answer --> Transferring UMI from paired-end read 2 to header of read 1

ADD COMMENT
0
Entering edit mode

Thanks, this simle idea worked great!

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6