Question

Calculating FPKM for large number of samples mapped to co-assembly

0

Entering edit mode

4.2 years ago

arla_21 • 0

Hi

I am very new to this line of analyses so please be kind and I am sorry if I miss any information.

I am interested in calculating the abundance of carbohydrate active enzyme sequences in my samples (but using a co-assembly).

I have co-assembled my samples (Megahit) and mapped the reads of each sample to the co-assembly (bowtie). I have also used dbcan to annotate the co-assembly with the carbohydrate active enzyme database. I then used ht-seq count to count the number of reads mapped to each gene in each sample Therefore, I currently have the counts for each sample but I am confused about how to normalise the counts. I also have a gtf file with all the gene calls for the co-assembly which looks like:

argelvor_000000000001   PROKKA  CDS 2   304 .   +   .   gene_id 1_1
argelvor_000000000002   PROKKA  CDS 1   168 .   -   .   gene_id 2_1
argelvor_000000000003   PROKKA  CDS 1   384 .   +   .   gene_id 3_1
argelvor_000000000004   PROKKA  CDS 1   321 .   +   .   gene_id 4_1
argelvor_000000000005   PROKKA  CDS 30  530 .   -   .   gene_id 5_1
argelvor_000000000006   PROKKA  CDS 1   96  .   +   .   gene_id 6_1
argelvor_000000000007   PROKKA  CDS 1   558 .   +   .   gene_id 7_1
argelvor_000000000008   PROKKA  CDS 2   484 .   -   .   gene_id 8_1
argelvor_000000000009   PROKKA  CDS 2   142 .   +   .   gene_id 9_1
argelvor_000000000009   PROKKA  CDS 191 343 .   +   .   gene_id 9_2

And a standard count matrix where gene ids are rows and samples are columns. Is it possible from this information to calculate FPKM (and have it automized). I am most comfortable in R but would welcome any suggestions.

Once I have the FPKM values, I can then use the gene ID's to map to the output of dbcan!

Thanks

metagenomics Assembly next-gen FPKM sequencing • 708 views

ADD COMMENT • link 4.2 years ago by arla_21 • 0