Question: How to extract filename and change text in the same file
1
gravatar for Inquisitive8995
3 months ago by
Inquisitive8995100 wrote:

Hello,

I have about 30 VCF files with file names as ID_001.new.vcf. I want to extract only the "ID_001" part from the file name and change it in the header line of the VCF file where "Sample1" is given.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1

So that the result looks like that:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ID_001

How can I do it ? I tried to use echo in bash and extract the IDs from the Filename but I am unable to iterate it to change inside the file. Thanks for your help.

script sequence vcf • 291 views
ADD COMMENTlink modified 3 months ago by Malcolm.Cook1.0k • written 3 months ago by Inquisitive8995100
  1. Extract sample names from VCF using bcftools (query -l)
  2. Prepare a new file with sample names (new names) one per line in the order of sample names from point 1
  3. Use bcftools reheader option to change the sample names from point 2.

Take a back up of original file before proceeding.

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad011211k
2
gravatar for Jeffin Rockey
3 months ago by
Jeffin Rockey1.1k
Karimannoor
Jeffin Rockey1.1k wrote:

In bash this should do.

for i in *.new.vcf
do
        ID_NAME=$(basename "$i" .new.vcf)
        sed -i "1s|Sample1|$ID_NAME|g" $i
done

Caution: I have used -i with sed. So the actual files will get edited in place.

Now added 1s also as to limit the replacement to first line alone.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Jeffin Rockey1.1k
2

I think would be better to use 'bcftools view --samples-file` than sed

ADD REPLYlink written 3 months ago by Pierre Lindenbaum121k

Hi Pierre, I did not understand. Would bcftools view do any replacement ?

ADD REPLYlink written 3 months ago by Jeffin Rockey1.1k

the option sample-file can be used to rename the samples. https://samtools.github.io/bcftools/bcftools.html

This file can also be used to rename samples by giving the new sample name as a second white-space-separated column, like this: "old_name new_name".

ADD REPLYlink written 3 months ago by Pierre Lindenbaum121k

This works when all files have Sample1 in the file name. Will that be the case?

ADD REPLYlink written 3 months ago by Vijay Lakhujani4.2k

Yes all files have Sample1

ADD REPLYlink written 3 months ago by Inquisitive8995100

@Jeffin , Thanks for your response. This line is not the first line within the file. How can I change sed in a way that it find the particular line where Sample1 is there and then change it to $ID_NAME ?

ADD REPLYlink written 3 months ago by Inquisitive8995100
1

Changing 1s| to simply s| will do replacements for all Sample1 occurrences.

ADD REPLYlink written 3 months ago by Jeffin Rockey1.1k

Thanks a lot. This worked !

ADD REPLYlink written 3 months ago by Inquisitive8995100
3
gravatar for Malcolm.Cook
3 months ago by
Malcolm.Cook1.0k
kansas, usa
Malcolm.Cook1.0k wrote:

If you have GNU parallel installed, you can use it instead of a bash for loop:

parallel 'sed -i "s|Sample1$|{=s/.new.vcf$//=}|"' {} ::: *.new.vcf
ADD COMMENTlink modified 3 months ago • written 3 months ago by Malcolm.Cook1.0k

Hi Malcom, The suggested command appears to be super efficient, even though I did not understand many of the usages. Can you please explain the {=s/.new.vc$f//=}, {}, ::: etc

ADD REPLYlink written 3 months ago by Jeffin Rockey1.1k
1

Sure.

In general, in your command line:

  • {} gets replaced with the file being processed.
  • {=perl expression=} gets replaced with the value of a perl expression being evaluated in the context of the perl variable $_ being set to the name of the file being processed.

So, in my example, we are using sed to replace the word "Sample1' appearing at the end of line with the result of removing the trailing .new.vcf from each filename.

Documentation for this can be found in parallel's manpage by searching for "{=perl expression=}", and where you can also read

::: arguments
Use arguments from the command line as input source instead of stdin (standard input). 
ADD REPLYlink modified 3 months ago • written 3 months ago by Malcolm.Cook1.0k

Fix: vc$f -> vcf\$.

Also try: parallel --plus 'sed -i "s|Sample1$|{%.new.vcf}|"' {} ::: *.new.vcf

ADD REPLYlink modified 3 months ago • written 3 months ago by ole.tange3.4k

Hi Ole,

Could you please point me to some link or so which would help me understand the {},::: etc.

ADD REPLYlink written 3 months ago by Jeffin Rockey1.1k

It is covered in GNU Parallel 2018 chapter 5 (Online https://doi.org/10.5281/zenodo.1146014, printed www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html)

ADD REPLYlink written 3 months ago by ole.tange3.4k

thanks for the fix and the alternate!

ADD REPLYlink written 3 months ago by Malcolm.Cook1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour