http://swrc.ontoware.org/ontology#Article
A Linear Time Algorithm that Infers Hidden Strings from Their Concatenations
en
Original Papers
Central Research Laboratory Hitachi Ltd.
Tomohiro Yasuda
Let T be a set of hidden strings and S be a set of their concatenations. We address the problem of inferring T from S. Any formalization of the problem as an optimization problem would be computationally hard because it is NPcomplete even to determine whether there exists T smaller than S and because it is also NP-complete to partition only two strings into the smallest common collection of substrings. In this paper we devise a new algorithm that infers T by finding common substrings in S and splitting them. This algorithm is scalable and can be completed in O(L)-time regardless of the cardinality of S where L is the sum of the lengths of all strings in S. In computational experiments 40 000 random concatenations of randomly generated strings were successfully decomposed as well as the effectiveness of our method for this problem was compared with that of multiple sequence alignment programs. We also present the result of a preliminary experiment against the transcriptome of Homo sapiens and describe problems in applications where real large-scale cDNA sequences are analyzed.
Let T be a set of hidden strings and S be a set of their concatenations. We address the problem of inferring T from S. Any formalization of the problem as an optimization problem would be computationally hard, because it is NPcomplete even to determine whether there exists T smaller than S, and because it is also NP-complete to partition only two strings into the smallest common collection of substrings. In this paper, we devise a new algorithm that infers T by finding common substrings in S and splitting them. This algorithm is scalable and can be completed in O(L)-time regardless of the cardinality of S, where L is the sum of the lengths of all strings in S. In computational experiments, 40,000 random concatenations of randomly generated strings were successfully decomposed, as well as the effectiveness of our method for this problem was compared with that of multiple sequence alignment programs. We also present the result of a preliminary experiment against the transcriptome of Homo sapiens and describe problems in applications where real large-scale cDNA sequences are analyzed.
AA12177013
IPSJ Transactions on Bioinformatics （TBIO）
1
13-22
2008-11-28
1882-6679