How to add gene entry to gtf file?
1
0
Entering edit mode
19 months ago
bioinfo ▴ 150

Hello,

I am trying to make a reference using cellranger but I want to change the gtf file. I want to remove the entries for a specific gene and add another entry for it. The entry that I want to add is in chromosome 4. Will it be a problem if I add the new entry at the end of the gtf file and not near the other chromosome 4 entries? Is there a way to add it close to the other chromosome 4 entries?

Thank you

single-cell cellranger • 1.7k views
ADD COMMENT
2
Entering edit mode
19 months ago
Ram 44k

What have you tried?

You can use sed to both delete specific line numbers and add content after a specific line. You can also use a combination of head, cat and tail if you have content in a new file (say, new_content.gtf) like so:

## Say, you want to add the new stuff after line 17
head -n 17 current.gtf >> new.gtf
cat new_content.gtf >>new.gtf
tail -n +18 current.gtf >>new.gtf

Say you want to remove lines 15 and 16, and also add the new content after line 17,

head -n 14 current.gtf >> new.gtf
head -n 17 current.gtf | tail -n 1 >> new.gtf #This picks just the 17th line - the 17th line is the last line when only the first 17 lines are picked
cat new_content.gtf >>new.gtf
tail -n +18 current.gtf >>new.gtf
ADD COMMENT
0
Entering edit mode

Thank you for the reply. I have used sed to remove most of the entries that I do not want. I used the code from cellranger to create the new entry that I want to add: To do this I used the part that is shown on cellranger mkref and just changed the GFP to my gene name and the position to the one for my gene.

echo -e 'cdk\tunknown\texon\t22,032,674\t22,032,986\t.\t+\t.\tgene_id "cdk"; transcript_id "cdk"; gene_name "cdk"; gene_biotype "protein_coding";' > cdk.gtf

Then I wanted to add it to my gtf file like shown below which adds it to the end of the gtf file:

cat GFP.gtf >> Grch38.filtered.GFP.gtf

I guess I will try to see what is the line number of the last entry I remove and then i will add the new gtf entry there.

This is the page (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_mr#marker) where I got the code from. I am following the instructions to add a marker gene but I don't want to add an extra gene like GFP. I just want to change one of the entries in the human genome.

ADD REPLY
0
Entering edit mode

Looks like I commented before you had a chance to finish your comment - I've cleaned up our conversation.

I've often seen tools error out when they find GTF entries out of order (or I'm misremembering, but erring on the side of caution is always better than the alternative). You can either add the line in the middle or even better, sort the file once you have the final GTF.

ADD REPLY
0
Entering edit mode

Thank you. I ended up using the method you said above with sed.

ADD REPLY

Login before adding your answer.

Traffic: 913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6