i have a folder with 12 files each file is a txt file with below format: my single file format is as below tab seperated txt file," each id ,count and sequence is newline separated":
ta_iwgsc_2dl_v1_880448_4767 62385 auagcaucauccauccuaccc
ta_iwgsc_5dl_v1_4475147_17525 62385 auagcaucauccauccuaccc
ta_iwgsc_5ds_v1_2769792_21617 51267 ugaagcugccagcaugaucug
ta_iwgsc_2dl_v1_9826058_5702 16290 uuccaaagggaucgcauugau
ta_iwgsc_4dl_v3_14471626_15454 11824 auagcaucauccauccuacca
ta_iwgsc_4dl_v3_14415829_14746 11824 auagcaucauccauccuacca
ta_iwgsc_3ds_v1_2039022_12082 4161 gcucacccucucucugucagc
each file has different ids for same sequences. I need to extract all lines in a new file with a common sequence; for each sequence in each file in that folder. for example :
folder name:common
having 12 txt files with above format.
example file name:CC1 result should be a new file having format:
ta_iwgsc_2dl_v1_880448_4767 62385 auagcaucauccauccuaccc
ta_iwgsc_5dl_v1_4475147_17525 62385 auagcaucauccauccuaccc
file 2:
ta_iwgsc_4dl_v3_14471626_15454 11824 auagcaucauccauccuacca
ta_iwgsc_4dl_v3_14415829_14746 11824 auagcaucauccauccuacca
i am fine with perl ,R, python scripts. tools for extraction is also fyn.
I modified your post for readability with Markdown code make-up, using the
101010
button.