To match the fastq header and print the files
1
0
Entering edit mode
7.2 years ago

Hi all, I am beginner in codding. I am trying to match the header line from first file to the second file and print the entire line below the matched header from the second file. My first file looks like:

@NewLibrary: gi|9626933|ref|NC_002027.1|_to_gi|9626931|ref|NC_002026.1|
@NewLibrary: gi|176121034|ref|NC_002028.2|_RevStrand_to_gi|176121034|ref|NC_002028.2|_RevStrand
@NewLibrary: gi|9626933|ref|NC_002027.1|_RevStrand_to_gi|9626933|ref|NC_002027.1|
@NewLibrary: gi|9626931|ref|NC_002026.1|_RevStrand_to_gi|9626933|ref|NC_002027.1|_RevStrand
@NewLibrary: gi|9626931|ref|NC_002026.1|_to_gi|176121034|ref|NC_002028.2|_RevStrand
@NewLibrary: gi|9626931|ref|NC_002026.1|_RevStrand_to_gi|176121034|ref|NC_002028.2|

My second file looks like:

@NewLibrary: gi|9626933|ref|NC_002027.1|_to_gi|9626931|ref|NC_002026.1|
638_to_1641_#_79        1315_to_1055_#_39       640_to_1641_#_38        1474_to_1423_#_37       633_to_1639_#_31        1475_to_1378_#_28       1696_to_1058_#_24       1773_to_1158_#_23       1475_to_1640_#_22  960_to_1010_#_20        276_to_1054_#_19        1246_to_926_#_18        637_to_928_#_18 634_to_918_#_17 1696_to_1054_#_16       1688_to_1055_#_15       1314_to_1725_#_15       564_to_1057_#_15 1315_to_2355_#_15       1449_to_1054_#_15       1314_to_1423_#_15       637_to_914_#_15 1542_to_72_#_14 636_to_928_#_14 633_to_939_#_13 1474_to_1156_#_13       1476_to_2633_#_13       664_to_1639_#_12        2115_to_2693_#_12       639_to_914_#_12 1542_to_74_#_12 1773_to_1156_#_12       633_to_947_#_12 1248_to_943_#_11        636_to_1640_#_11        1122_to_458_#_11        640_to_1523_#_11        1246_to_1152_#_11       320_to_1344_#_11        1543_to_48_#_11 1246_to_2871_#_11       319_to_1360_#_11        1771_to_1151_#_10       1842_to_1477_#_10       635_to_917_#_10 121_to_1521_#_10        634_to_912_#_10 1840_to_455_#_10        1489_to_1424_#_10       638_to_1637_#_10        1770_to_1156_#_9        633_to_2208_#_9 1822_to_2625_#_9        1314_to_466_#_9 1245_to_956_#_9 832_to_1053_#_9 1543_to_1001_#_9        1245_to_12_#_9  1245_to_927_#_9 1081_to_1366_#_9        277_to_2118_#_9 2509_to_1010_#_9        1469_to_222_#_9 1543_to_945_#_9 1542_to_85_#_9  1245_to_939_#_9 1623_to_517_#_9 1245_to_2077_#_9

So I want to match the header print it in new file every time. I am having very hard time to write the code. Thankyou for your help.

next-gen • 799 views
ADD COMMENT
0
Entering edit mode
7.2 years ago
  • sort first file
  • linearize the lines of the second file sort, on first field.
  • join both files.

All in one, using bash

join -t $'\t' -1 1 -2 1 <(sort file1.txt) <(cat file2.txt | paste - - | sort -t $'\t' -k1,1) |
ADD COMMENT

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6