I made a splice junction library to map the bowtie unmapped reads (to genome) like this:
chr167000051+67091529 ccaccatgatggaaggattgaaaaaacgta 30 chr167091593+67098752 ggacactgattctacaggttcaccagatag 30 chr167098777+67101626 gatagagatggaattcagcccagcccacac 30 chr167101698+67105459 ggaaaaaaagtttcgaagaaaagcaatggg 30 chr167105516+67108492 gattgggaaagatataactcacctgagctg 30 chr167108547+67109226 ccgaggaacccggctctaccaaaggaaagc 30 #It's a CSV file
Now we want to make a negative control to know if the reads aligns to the library more than expected to do by chance alone. To do that, we want to scramble the sequences at second column of the CSV generated.
I'm learning python and programming in general, and I made the script to generate the splice junction library by myself. So, you can help me telling me a python tip to scramble a string or letting me know about some tool to scramble a column of a CSV (or sequences of a multifasta file).
Bash tips using AWK, sed or wherever are also welcome.
Thanks a lot for your help!!!
using random.shuffle I get:
chr167000051+67091529 cgacaaaagtaacggactttaaaaggactg 30 chr167091593+67098752 ggcaaactattggtctataaatagccccgg 30 chr167098777+67101626 accgtgctcaatgaagacaaccgcgtaacg 30 chr167101698+67105459 atggttacggaacaaaaagaaaggaaaggt 30 chr167105516+67108492 ttaataggagaagcacagctcttagatcgg 30 chr167108547+67109226 gaaggcggcaaaccctacaatgacgccgca 30