How to extract filename and change text in the same file
2
2
Entering edit mode
2.7 years ago

Hello,

I have about 30 VCF files with file names as ID_001.new.vcf. I want to extract only the "ID_001" part from the file name and change it in the header line of the VCF file where "Sample1" is given.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1


So that the result looks like that:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ID_001


How can I do it ? I tried to use echo in bash and extract the IDs from the Filename but I am unable to iterate it to change inside the file. Thanks for your help.

sequence VCF Script • 1.5k views
0
Entering edit mode
1. Extract sample names from VCF using bcftools (query -l)
2. Prepare a new file with sample names (new names) one per line in the order of sample names from point 1
3. Use bcftools reheader option to change the sample names from point 2.

Take a back up of original file before proceeding.

2
Entering edit mode
2.7 years ago
Jeffin Rockey ★ 1.2k

In bash this should do.

for i in *.new.vcf
do
ID_NAME=$(basename "$i" .new.vcf)
sed -i "1s|Sample1|$ID_NAME|g"$i
done


Caution: I have used -i with sed. So the actual files will get edited in place.

Now added 1s also as to limit the replacement to first line alone.

2
Entering edit mode

I think would be better to use 'bcftools view --samples-file than sed

0
Entering edit mode

Hi Pierre, I did not understand. Would bcftools view do any replacement ?

0
Entering edit mode

the option sample-file can be used to rename the samples. https://samtools.github.io/bcftools/bcftools.html

This file can also be used to rename samples by giving the new sample name as a second white-space-separated column, like this: "old_name new_name".

0
Entering edit mode

This works when all files have Sample1 in the file name. Will that be the case?

0
Entering edit mode

Yes all files have Sample1

0
Entering edit mode

@Jeffin , Thanks for your response. This line is not the first line within the file. How can I change sed in a way that it find the particular line where Sample1 is there and then change it to $ID_NAME ? ADD REPLY 1 Entering edit mode Changing 1s| to simply s| will do replacements for all Sample1 occurrences. ADD REPLY 0 Entering edit mode Thanks a lot. This worked ! ADD REPLY 3 Entering edit mode 2.7 years ago Malcolm.Cook ★ 1.3k If you have GNU parallel installed, you can use it instead of a bash for loop: parallel 'sed -i "s|Sample1$|{=s/.new.vcf$//=}|"' {} ::: *.new.vcf  ADD COMMENT 0 Entering edit mode Hi Malcom, The suggested command appears to be super efficient, even though I did not understand many of the usages. Can you please explain the {=s/.new.vc$f//=}, {}, ::: etc

1
Entering edit mode

Sure.

In general, in your command line:

• {} gets replaced with the file being processed.
• {=perl expression=} gets replaced with the value of a perl expression being evaluated in the context of the perl variable $_ being set to the name of the file being processed. So, in my example, we are using sed to replace the word "Sample1' appearing at the end of line with the result of removing the trailing .new.vcf from each filename. Documentation for this can be found in parallel's manpage by searching for "{=perl expression=}", and where you can also read ::: arguments Use arguments from the command line as input source instead of stdin (standard input).  ADD REPLY 0 Entering edit mode Fix: vc$f -> vcf\$. Also try: parallel --plus 'sed -i "s|Sample1$|{%.new.vcf}|"' {} ::: *.new.vcf`

0
Entering edit mode

Hi Ole,

Could you please point me to some link or so which would help me understand the {},::: etc.

0
Entering edit mode

It is covered in GNU Parallel 2018 chapter 5 (Online https://doi.org/10.5281/zenodo.1146014, printed www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html)

0
Entering edit mode

thanks for the fix and the alternate!

0
Entering edit mode

deleted my comment since it is solved