Entering edit mode
3.5 years ago
storm1907
▴
30
Hello, I have bed file, looking like this: (original is NCBi GTF file, converted to bed format)
I need to convert all delimiters to tabs in the given file for further usage with bedtools. I tried commands, according to this thread: https://stackoverflow.com/questions/1424126/replace-whitespaces-with-tabs-in-linux
However, after running bedtools, I get the following error
***** ERROR: illegal character 't' found in integer conversion of string "transcript". Exiting...
I cannot understand, why basic txt conversion commands dont work on this file. Also - is there a way to switch NC chromosomal code to chromosome number (i.e. NCXXX to 1)?
Thank you!
what method from that SO thread did you apply? and can you then post an example of the output file of it.
I tried
awk -v OFS="\t" '$1=$1' out_tabs
andThe output was:
sed "s/[[:space:]]+/$T/g" gave output like this:
and
perl -p -i -e 's/ /\t/g' file.txt
ended with empty outputIs this what you are looking for? Also make sure that you do not have duplicates in the file and if it is windows originated file, look at hidden characters. Try
dos2unix
on windows generated text files.I tried this command, but still get the same issue
can you try this in Vim?
https://superuser.com/questions/1188594/control-i-characters-in-my-text-file. Also check if your file has any invisible characters. Hint is given in the same post.
Can you have a look at the file by running
sed -n 'l' input.txt
. This would print the invisible characters in the file. (l is small letter L) `this is the
and output of
I tried using the files above by removing
$
and\t
and using the functionbedtools intersect -a sample.bed -b out.bed
. It didn't throw any error. However, it seems there is another issue with your files. First column (of chromosomes/contigs) do not match in your files.You can join biostar slack and share the part or full files privately with the members, it is okay with you.
This is what I did with the data furnished by you:
Note: I added 2 entry to sample bed and I wrote a small sed script to trim 1 column in out.bed.
I checked out.txt file one more time, and indeed, in the middle of the file there are rows like:
How can I get rid of them? The file was obtained from NCBI GTF files, with
file was downloaded from here: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39 however it is quite complex, and could take too much space, if I post a fragment of it here
Please do the following:
Download gtf from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.gtf.gz
Unzip (gzip -d) the file, convert first column as per the your file and use it direct with
bedtools
However, if you want to use bed file only, edit the first column (from gtf as per your input file), save the new file as gtf and convert it to bed by using 'gtf2bed' from bedops. Use the output bed.
Note that gtf file will have annotations at multiple levels (eg. gene, transcript, CDS, exon). Accordingly, extract those features of your interest from gtf or bed. In the example below, I have used full gtf and hits were too many. I printed only 2 lines.
Thank you! Sorry, but how should I edit gtf file then? I need only gene level
Keep only those lines from the result of
bedtools intersect
with wordgene
in column 3?that's correct. If you do that to gtf,
intersect
operation would be faster as new gtf file is smaller in size. To format gtf (first column), use this code:$ sed -r 's/^NC_[0]*//;s/\.[0-9]\+//;/^N[WT]_|12920/d' GCF_000001405.39_GRCh38.p13_genomic.gtf > new.gtf
Thank you! But got 1.11 instead of 1 with this command line
what's your input file? If you are using the gtf file from the link I shared with you, it should work. Please paste first line from your input gtf.
Hello, so I downloaded file via
wget -c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.gtf.gz
Then gunzipped it (with -d), executed command
runned bedtools command:
and got empty output, because this is the output of sed command line
try this and post the results here:
This worked, thank you!
Hello, sorry is this bedtools option? How can I write this command?
without posting example data and commands used, it is difficult to work with image.
Sorry about that. I posted commands above, in previous commands
And this is what i get when using bedtools
Post actual data. Not a screenshot.
My apologies
This i guess is from out_tabs.bed. What about sample.bed? Does any one of these files have string "transcript" in them?
not really