Question: RNA-seq amending fasta/fastq files (from one line into two lines)
0
gravatar for canberkyurek
17 months ago by
canberkyurek0 wrote:

Hi,

I have been analyzing a set of small RNA seq and I encountered a small problem with fasta/fastq files. After trimming and collapsing, I wanted to filter for reads that are 22 nt long with a Guanine in the 5'. This is the code I used to filter the reads:

cat input_wt3_trimmed_collapsed_1_2.fq | paste - - | awk 'length($4) >= 22 && length($4) <=22' | sed 's/\t/\n/g' > input_wt3_trimmed_collapsed_2.fq

awk '$2 ~ /^G/'  elution_wt1_trimmed_collapsed_1_2.fq >  elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq

However, these command lines converted my fasta/fq files into one line fasta format from two lines format, here is the example:

before:

>1-1763
TACCCGTATAAGTTTCTGCTGAG
>2-1550
TGAGATCGTTCAGTACGGCAA

after:

>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

I would really appreciate if you guys have any better way of fixing this problem. When I want to map these reads with STAR, it is not recognised as compatible. I guess I need to convert the final file into a two lines fasta file such as:

>73-969 
GAGATCGGGCGGGAAGTGGTAT
>89-940 
GTTTCCGGCTCACGTCCTCTGA
>90-938 
GCGTGTAAGTTCGGCGGCGTGA

What could be the best way to fix this problem?

best

Ahmet

rna-seq sirna fasta • 593 views
ADD COMMENTlink modified 17 months ago by WouterDeCoster37k • written 17 months ago by canberkyurek0
1

please reformat the examples. everything is just one line. also its length($4) == 22 and in awk you can also test for G at 5'

ADD REPLYlink written 17 months ago by Ido Tamir4.9k
1

I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 17 months ago by WouterDeCoster37k

from your code in OP, I understand that you are parsing a fq file and your output is also fastq file . But examples provided by you are neither fastq/fq nor fasta. Could you please post a record or few records from fq?

ADD REPLYlink written 17 months ago by cpad011211k
2
gravatar for WouterDeCoster
17 months ago by
Belgium
WouterDeCoster37k wrote:

Looks like you need to convert a space to a newline, try with tr

cat elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq | tr ' ' '\n' > output.fq
ADD COMMENTlink written 17 months ago by WouterDeCoster37k

same in sed:

$ cat test.tab 
>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

in sed:

$ sed -e 's/ /\n/g' test.tab 
>73-969
GAGATCGGGCGGGAAGTGGTAT
>89-940
GTTTCCGGCTCACGTCCTCTGA
>90-938
GCGTGTAAGTTCGGCGGCGTGA
ADD REPLYlink written 17 months ago by cpad011211k

yup, that also works

ADD REPLYlink written 17 months ago by WouterDeCoster37k

Thanks mate! this solved the problem completely!

ADD REPLYlink written 17 months ago by canberkyurek0

Glad to help.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 17 months ago by WouterDeCoster37k

Thanks! I will follow your suggestions!

ADD REPLYlink written 17 months ago by canberkyurek0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour