Question: Extracting matrix columns specific to file1 and file 2, not the overlap or common?
0
gravatar for abdul.suboor123
23 days ago by
Huazhong Agricultural University, China
abdul.suboor1230 wrote:

I have two small RNA matrix files, having almost 87% overlap. I want to extract those columns which are only specific to file 1 and specific to file 2, I am giving an example of my data:

File1.Sample 1:

    AAAAAAACAAGGATCAACAAGACT        0.0835      0       0.2743      0.197     0.069      0.44       0.195     0.31
    AAAAAAACACTCGGCAAAGAACCC        0.3343       0.0    1.641      2.170       1.82       0.88      0.758
    AAAAAAACCCTCTGACGCAGCACC        0.167      0       1.455        0.096     0.487       0       0       0       1.55        
    AAAAAAACCGCCACTAGAAATCGT        0.0835      0.0843      0.557       0.888      1.35       0.88    0.66
    AAAAAAACGTACTTCGTGCCGACT        0.0835     0.599       0       0       0       0       0.351       0       0       0    
    AAAAAAACTCGGAACCCTAATCTG        0.083      0.2569       0.364      0.260       0.286       0.10       0.35

File2. Sample2:

    AAAAAACACTCGGCAAAGAAGGCT        0.167       0       0.674       1.0531      0.3878  0.61838       0.08543      0.387
    AAAAAACACTCGGCAAAGGCTTTG        0.51        0.22       1.82        0.888   0.87699       1.6497       0.17659
    AAAAAACAGACTTTGTATCGACT         2.846        0.0300     0.1824    0.39       0.94       0.4692       0.31817
    AAAAAACAGATGCCGAAGATGT          1.8389        0.4282       4.0117        2.562        0.54       1.649477        
    AAAAAACAGTATTCGAAACGGGAC        0.1677       0.08511      1.55052        0.6997       0.58733       1.75284

File3.Overlap:

    AAAAAAACGTACTTCGTGCCGACT        0.0835     0.599       0       0       0       0       0.351       0       0       0    
    AAAAAAACTCGGAACCCTAATCTG        0.083      0.2569       0.364      0.260       0.286       0.10       0.35
    AAAAAACACTCGGCAAAGAAGGCT        0.167       0       0.674       1.0531      0.3878  0.61838       0.08543      0.387
    AAAAAACACTCGGCAAAGGCTTTG        0.51        0.22       1.82        0.888   0.87699       1.6497       0.17659

These are the three files, file 1 is sample 1, file 2 is sample 2 and file 2 overlap or common between file 1 and 2 based on column 1. I want to extract those specific sequences which are specific to the respective file along with the matrix values . I have tried these several commands also got from biostar through search, includes:

cat sorted_b73matrix.txt sorted_mo17matrix.txt|sort |uniq -u |awk '$1==1' > 123.txt
grep -vxFf sorted_b73matrix.txt sorted_mo17matrix.txt > B73_specific_martix.txt
grep -vxFf sorted_mo17matrix.txt sorted_b73matrix.txt > M017_specific_matrix.txt
cat file1.tx file2.txt |sort |uniq -c |awk '$1==1'

But the result is not correct. maybe my parameters are wrong. Please tell me how i will get my matrix file specific to the respective files, not the overlap.

columns sequence matrix • 90 views
ADD COMMENTlink modified 23 days ago by Bastien Hervé4.5k • written 23 days ago by abdul.suboor1230

From your example it seems like you want to extract specific rows not columns, right ?

ADD REPLYlink written 23 days ago by Bastien Hervé4.5k
1
gravatar for Bastien Hervé
23 days ago by
Bastien Hervé4.5k
Limoges, CBRS, France
Bastien Hervé4.5k wrote:

If you want in the same output file to extract lines from file 1 where sequences are not present in file 2 AND lines from file 2 where sequences are not present in file 1 :

awk 'NR==FNR { a[$1]++ } NR!=FNR && a[$1]==1' <(cat sorted_mo17matrix.txt sorted_b73matrix.txt) <(cat sorted_mo17matrix.txt sorted_b73matrix.txt)
ADD COMMENTlink written 23 days ago by Bastien Hervé4.5k

Thanks for your reply, yeah actually, i was extracting specific rows @Bastien Herve , it works for me.

ADD REPLYlink written 23 days ago by abdul.suboor1230

Glad it helps, you can accept it as an answer to your post (the check marker next to the thumb)

ADD REPLYlink written 23 days ago by Bastien Hervé4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour