Metagenome Fragment Classification Based on Multiple Motif-Occurrence Profiles — ASN Events

Metagenome Fragment Classification Based on Multiple Motif-Occurrence Profiles (#6)

Naoki Matsushita 1 , Shigeto Seno 1 , Yoichi Takenaka 1 , Hideo Matsuda 1
  1. Osaka University, Osaka, Japan
An enormous amount of metagenomic data have been obtained to extract multiple genomes simultaneously from microbial communities, including from uncultivable microbes. By analyzing metagenomic data, such microbes are discovered and new microbial functions are elucidated. The first step to analyzing the data is sequenced-read classification into reference genomes from which each read could be derived. The Naïve Bayes Classifier is one of the methods for the classification. To identify the derivation of the reads, the method calculates the score based on the occurrence of a DNA sequence on each reference genome. However, large differences are present among their genome sizes, which bias the scoring of the genomes. This bias may cause erroneous classification and diminish the classification accuracy. To cope with this issue, we have enhanced the Naïve Bayes Classifier with multiple sets of occurrence profiles for each reference genome by leveling genome sizes: dividing its genome sequence into a set of subsequences having approximately same lengths and by generating profiles for each subsequence. The multiple profile scheme improves the accuracy of the results by the Naïve Bayes Classifier for simulated and Sargasso Sea datasets.