Question: gene sequence fetching from bam or fastq file of rna seq data
1
gravatar for fatimarasool135
6 months ago by
fatimarasool13530 wrote:

I have 6 wheat samples sequenced using RNA-seq. I received forward and reverse fastq files and I generated bam files by using hisat2 tool which are aligned with the reference wheat genome. I have been asked to build multiple sequence alignment for 3 genes from this sequenced rna seq data. I believe I need to select a gene sequence from all the samples and do a multiple sequence alignment. But I am struck in fetching the gene sequence from bam files. How do I select one gene sequence for all the 6 samples? Any suggestions? kidly send me commands for fetching the sequence of genes from this RNA seq data in fastq file or aligned.bam file.

ADD COMMENTlink modified 5 months ago • written 6 months ago by fatimarasool13530

are you looking for isoforms? read about transcript assembly with stringtie, or isoform detection, once detected or assembled you can extract the sequences and align them.

ADD REPLYlink modified 6 months ago • written 6 months ago by Buffo1.6k

Hi buffo,

I have run the strigtie and got the assembled transcripts.next i want to fetch the transcripts sequences from all six sample .how can I do?

ADD REPLYlink written 24 days ago by fatimarasool13530

you can use gff_read to do that

ADD REPLYlink written 24 days ago by Buffo1.6k

Hi Buffo,

        I have run the stringtie tool and five files have been generated.

1 GTF file containing the assembled transcripts 2 Gene abundances in tab-delimited format 3 Fully covered transcripts that match the reference annotation, in GTF format 4 Files required as input to Ballgown 5 a merged GTF file from a set of GTF files. I did not use the gff_read tool. I have read the manual of it but did not understand. Can you please set the command for me. The command given in manual is gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf

In this command i did not understand which genomic file and transcripts.fa file I have to use.

ADD REPLYlink written 23 days ago by fatimarasool13530

Hi Buffo,

I have run the stringtie tool and five files have been generated.

1 GTF file containing the assembled transcripts

2 Gene abundances in tab-delimited format

3 Fully covered transcripts that match the reference annotation, in GTF format

4 Files required as input to Ballgown

5 a merged GTF file from a set of GTF files.

I did not use the gff_read tool. I have read the manual of it but did not understand. Can you please set the command for me. The command given in manual is

gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf

In this command i did not understand which genomic file and transcripts.fa file I have to use.

ADD REPLYlink modified 23 days ago • written 23 days ago by fatimarasool13530

The -w option refers to the output file name, -g needs the genome reference (you use it to map your reads) and the file transcripts.gtf is the option 1 of the above list (1 GTF file containing the assembled transcripts).

ADD REPLYlink written 23 days ago by Buffo1.6k
0
gravatar for fatimarasool135
5 months ago by
fatimarasool13530 wrote:

Hi Buffo, Thanks, I have run the srtigtie.got this error. kindly check this error.i did not get it and solve it .Kindly set this commandaccording to my sample.

./stringtie G1_sorted.bam -B -o G1.gtf -G Triticum_aestivum.IWGSC.42.gtf -p 4 -C G1.refs.gtf -A G1.abund.tab -WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences

ADD COMMENTlink written 5 months ago by fatimarasool13530

If you use -c

-C output a file with reference transcripts that are covered by reads

And gets

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!

It literaly means that you reads have mapped to regions without annotated transcripts. Be sure you are using the correct annotation. column 3 = Transcript

ADD REPLYlink written 5 months ago by Buffo1.6k

Now I run this command without the -c option .again got the same error. suggest me a solution .

./stringtie G1_sorted.bam -G Triticum_aestivum.IWGSC.42.gtf -l G1-Label -o G1_ST.gtf -p 15 WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.

ADD REPLYlink written 5 months ago by fatimarasool13530

here is gtf file

!genome-build IWGSC

!genome-version IWGSC

!genome-date 2018-07

!genome-build-accession GCA_900519105.1

3B IWGSC gene 212892 214491 . - . gene_id "TraesCS3B02G000100"; gene_source "IWGSC"; gene_biotype "protein_coding"; 3B IWGSC transcript 212892 214491 . - . gene_id "TraesCS3B02G000100"; transcript_id "TraesCS3B02G000100.1"; gene_source "IWGSC"; gene_biotype "protein_coding"; transcript_source "IWGSC"; transcript_biotype "protein_coding";

ADD REPLYlink modified 5 months ago • written 5 months ago by fatimarasool13530

Before to perform any bioinformatic analysis I would recommend you:

Be sure what are you looking for 
It is possible to get by in silico analysis?
what tools are available to do it? and how it works?
Do I need professional help?

Sorry but it looks like you do not have idea what are you doing, read about gtf file format. If you perform Stringtie analysis it searchsfor transcripts (column 3), and therefore, if you do not have annotated transcripts or your reads maps to not annotated regions you will get a warning like....

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
ADD REPLYlink modified 5 months ago • written 5 months ago by Buffo1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour