Question

Converting U to T in my miRNA sequences

0

Entering edit mode

2.9 years ago

aranyak111 • 0

I have the GR38 part of micro RNA sequences for humans. I want to convert the U in the sequences to Ts so that I can match them with my own FASTQ files .

The first few lines of the miRNA FASTA looks like this

>hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7b-3p MIMAT0004482 Homo sapiens let-7b-3p

The first few lines of the FASTQ file I want to align looks like this

@SRR8248790.1401 HWI-D00306:1090:HKVGMBCX2:1:1101:6697:2269/1
CGCGACCTAGATCGGAAGAGCACACGTCT
+
DDDDDIIIIIIHIIIIIIIIIIIIIIIII
@SRR8248790.1402 HWI-D00306:1090:HKVGMBCX2:1:1101:6630:2272/1
CTCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGAAGATCGGAAGAGCACACGTCTGAACTCCAGT
+
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIGHGHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@SRR8248790.1403 HWI-D00306:1090:HKVGMBCX2:1:1101:6516:2280/1
TGAGGTAGTAGGTTGTGTGGTTTAGATCGGAAGAGCACACGTCT

As you can see the header line of each FASTA sequence is different. I want to convert each of the sequences in the FASTA so that the header is not affected and I have my required conversion.

I have tried to use both awk and sed commands to do such conversion without much success.

The sed script I used is

sed '/^[^>]/s/u/t/g' Homo_sapiens.GRCh38.miRNA.fasta >newfile.fasta

The awk script to do the same is

awk '/^[^>]/{ gsub(/u/,"t") }1' Homo_sapiens.GRCh38.miRNA.fasta > newfile.fasta

Any help will be useful.

RNA-Seq • 1.4k views

ADD COMMENT • link updated 2.9 years ago by Pierre Lindenbaum 161k • written 2.9 years ago by aranyak111 • 0

0

Entering edit mode

From a first glance, both your awk and sed are misformed. Once corrected, they should work.

I think the pattern matching part in sed has an extra / and the sed would also benefit from the --extended-regexp flag.

You may want to add a $0 to the gsub on awk and see if that works.

ADD REPLY • link 2.9 years ago by Ram 43k

0

Entering edit mode

awk and sed are both case-sensitive; u is not U.

ADD REPLY • link 2.9 years ago by Pierre Lindenbaum 161k