Metagenome Fragment Classification Based on Multiple Motif-Occurrence Profiles (#6)
An enormous amount of metagenomic data have been
obtained to extract multiple genomes simultaneously from microbial communities,
including from uncultivable microbes. By analyzing metagenomic data, such
microbes are discovered and new microbial functions are elucidated. The first
step to analyzing the data is sequenced-read classification into reference
genomes from which each read could be derived. The Naïve Bayes Classifier is
one of the methods for the classification. To identify the derivation of the
reads, the method calculates the score based on the occurrence of a DNA
sequence on each reference genome. However, large differences are present among
their genome sizes, which bias the scoring of the genomes. This bias may cause
erroneous classification and diminish the classification accuracy. To cope with
this issue, we have enhanced the Naïve Bayes Classifier with multiple sets of
occurrence profiles for each reference genome by leveling genome sizes: dividing
its genome sequence into a set of subsequences having approximately same
lengths and by generating profiles for each subsequence. The multiple profile
scheme improves the accuracy of the results by the Naïve Bayes Classifier for
simulated and Sargasso Sea datasets.