Question: Adding plasmid sequences to gtf file
0
gravatar for Kristin Muench
6 months ago by
United States
Kristin Muench310 wrote:

Hello,

I am trying to align some .fastq files to both the hg19 human genome build AND a couple of plasmid sequences.

I am trying to figure out how I can include these plasmid sequences as "chromosomes" in the .gtf file.

The lines I tried to add look like this (product of >>tail myGTF.gtf) (my plasmids are plas_hsk, plas_hul, and plas_shp):

chrY    unknown stop_codon  59343078    59343080    .   +   .   gene_id "IL9R"; gene_name "IL9R"; p_id "P21953"; transcript_id "NM_002186_1"; tss_id "TSS15302";
chrY    unknown exon    59358329    59359508    .   -   .   gene_id "DDX11L16"; gene_name "DDX11L16"; transcript_id "NR_110561_1"; tss_id "TSS3419";
chrY    unknown exon    59360007    59360115    .   -   .   gene_id "DDX11L16"; gene_name "DDX11L16"; transcript_id "NR_110561_1"; tss_id "TSS3419";
chrY    unknown exon    59360501    59360854    .   -   .   gene_id "DDX11L16"; gene_name "DDX11L16"; transcript_id "NR_110561_1"; tss_id "TSS3419";
plas_hsk    AddedGenes  exon    1   12915   .   +   0   gene_id "plas_hsk"; gene_name "plas_hsk"; transcript_id "plas_hsk"; tss_id "plas_hsk";
plas_hul    AddedGenes  exon    1   12262   .   +   0   gene_id "plas_hul"; gene_name "plas_hul"; transcript_id "plas_hul"; tss_id "plas_hul";
plas_shp    AddedGenes  exon    1   11886   .   +   0   gene_id "plas_shp"; gene_name "plas_shp"; transcript_id "plas_shp"; tss_id "plas_shp";[kmuench@smsx10srw-srcf-d15-37 20180223_alignToPlasmidOnly]

Unfortunately, when I run STAR, I get this error message.

Fatal INPUT FILE error, no valid exon lines in the GTF file: /path/to/my/gtf/myGTF.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

This suggests to me that something is wrong about how I formatted my GTF file - but I can't figure out what's wrong with it, as it's just the Illumina GTF file plus a couple of lines I added in order to make "chromosomes" representing my plasmids.

BTW here is the STAR command I am using ($gtf points to the path to myGTF.gtf):

STAR --runMode genomeGenerate \\
     --genomeDir $myGenomeDir \\
     --genomeFastaFiles $hg19 $plasmidFasta \\
     --sjdbGTFfile $gtf \\
     --sjdbOverhang 100 \\
     --genomeSAindexNbases 5 \\
     --runThreadN ${SLURM_NPROCS:-1} \\
     --readFilesIn ${workingDir}/$Read1 ${workingDir}/$Read2 \\
     --outReadsUnmapped Fastx \\
     --scoreDelOpen -10000 --scoreInsOpen -10000 \\
     --outFileNamePrefix ${workingDir}/${outputFileLoc}/${sample}_

Thanks for your help!

Kristin

rna-seq • 447 views
ADD COMMENTlink written 6 months ago by Kristin Muench310

I think you're missing a new-line at the end of the GTF file

ADD REPLYlink written 6 months ago by Asaf4.9k

In addition to this, you may want to be comprehensive about this and a complete entry for your gene, which would include lines for gene transcript and then multiple exon entries (for multiple exon transcripts). Take a look at other full transcripts in your GTF to see how they're recorded.

ADD REPLYlink written 6 months ago by Kevin Blighe28k

Thanks! I'll make these changes and give it a try/report back.

ADD REPLYlink written 6 months ago by Kristin Muench310

Did you check this "Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file."?

ADD REPLYlink modified 6 months ago • written 6 months ago by grant.hovhannisyan1.1k

I thought I had, but it looks like the .fa file had the gene names in upper case. I'll fix it to make them match exactly and report back on the result.

ADD REPLYlink written 6 months ago by Kristin Muench310

Update: it now runs, but unfortunately the alignment is taking a lot of time (5 days on the step "sorting Suffix Array chunks and saving them to disk..."). Will keep working on it, thanks!

ADD REPLYlink written 6 months ago by Kristin Muench310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 578 users visited in the last hour