Question

How can I use a short sequence as a probe to fish out sequences have the same part out from raw illumina paired reads files?

0

Entering edit mode

2.8 years ago

wang-yanfang • 0

Dear all,

I have 150bp paired end sequences R1 and R2, after I clean the sequences I used sortmerna to sort out all the reads matching rna databases. so I have two fasta files containing reads like this:

@NB551184:334:HKHYYAFX2:1:11101:8313:12479 2:N:0:0
AAGGTGATATGAACCGTTATAGCCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGACACACTATCATTACGTGAATACATAGCGTAATGAAGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAG
+
AAAAAEEEEEEEEEEEAAEEEEAEEEA<EAEEEEEEEEAEEEEEEEAAEE<E<AEAEEA/EEAEEAEEEEE6EEAEAEEEAEEEEEE/EEEE<EEEAEA<EEE/EE/E//<AAEAAEEE/EEAA<<EAAAE<EE6A<//EEEEEA<EEA/
@NB551184:334:HKHYYAFX2:1:11101:26630:12486 2:N:0:0
TAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTCACCGCGTGCACTGGTCCGGCCGGGCCTTTCCCTCTGGGGAACCGCATGCCCTTCACTGGGTGTGTCGGGGAACCAGGACTTTTACTTTGAACAAATT
+
AAAAAEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEAEEEE<EEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA/EAEEEEEEEEEAEEEEEAA<EEEEEAAEEA/EEEEEEEEAAEEEEAAAEEEEAEE<

And I have four sequences(each is 25bp, with minor differences) that I am interested in and I want to know from all the rna 150bp reads, what's the ratio between these four sequences.

How could I do this?

The original idea here is that: my bacterium has 6 16S, and each one is slightly different from each other, so we want to know under different conditions, how do those different 16S expressed. And we found the differences of these 16S are mainly at a short sequence (25bp), like below:

CATATCTGACCGGCATCGGTTGGAT
TATATCTGACCGGCATCGGTTGGAT
TATATCTGACCGGCATCGGTTAGCT
CATATCTGACCGGCATCGGTTAGCT

If we make contigs using all these reads then it could mess up with the combination, of T or C at the beginning, or G.A or A.C at the very end. So I decided to used these four 25bp sequences as probe to fish out those reads matching them, and count the number of reads.

Can anyone help with the code or software I can use?

Any idea and help will be highly appreciated!

Thanks
Yanfang

rna paired • 922 views

ADD COMMENT • link updated 2.8 years ago by swbarnes2 14k • written 2.8 years ago by wang-yanfang • 0

0

Entering edit mode

What have you tried? Surely you started with grep?

ADD REPLY • link 2.8 years ago by swbarnes2 14k