Hi there,
I am trying to use the FilterSamReads tool from picard to subset reads from BAM files. My BAM files have their reads labelled in the following way:
TRINITY_DN9898_c0_g1_i1
The "_i1" at the end of this name represents a differentially expressed version of the read "TRINITY_DN9898_c0_g1". I need to subset based on the first part of the name ("TRINITY_DN9898_c0_g1"), so that the output BAM will contain all differentially expressed versions of that read (I.e. could be "i1" or "i3" etc...).
This is just one example of a list of 500 differentially expressed reads I am trying to extract from whole-transcriptome libraries. I ran the FilterSamReads tool but the output returned all of the reads unfiltered - I suppose it did not detect the partial read names I gave it. Does anyone know how I can accomplish my task?
Thanks.
If you have a file of patterns you're looking for, a little script in python or perl would allow you to query read names for those patterns. For instance, using samtools to pipe the reads to your program, you could then pass on lines beginning with "#", take the read name from other lines and query whether the suffix is found in a an array of your patterns.
Thanks for your reply. I am unfortunately pretty unfamiliar with coding and am new to bioinformatics, do you happen to have a link to an example code of something similar?
This would be a great excuse to learn some things like awk or python, because it involves some simple but powerful steps (read some input, parse it for a pattern, print it back out). However, shell scripting can also be very powerful. Pierre supplied an answer below that would be easy to incorporate using a shell script and "grep" (a linux command that finds patterns). If you can place the 500 names you're looking for in a file, you can convert it into a shell script that would do what Pierre suggested for each of the patterns you're looking for. This would amount to reading your input BAM file 500 times, but it would get the job done, and you'd learn something about scripting. Use small toy examples to get started. You simply have to know how to write a script. See the example below for just 3 patterns. Google bash script for clues on how to run it. Short version is: place commands in a text file with the first line #!/bin/bash, make sure the file is executable (chmod 755 myscript.sh), then you can run it by typing it's name (./myscript.sh). Google and toy examples are your friend when trying to figure out the pieces.