doi: 10.4304/jsw.6.12.2361-2368
A Hybrid Method for XML Clustering by Structure and Content
Abstract—An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is firstly measured by frequency-path model and its similarity calculation algorithm with position and frequency weight by longest common subsequence is introduced. In order to improve the performance and precision, the frequency-path model is further extended by considering the structure and content information simultaneously. Experiments show that the NCC embed with hybrid similarity calculation method can obtain high purity and F-measure value and is effective and applicable for clustering XML with both homogenous and heterogeneous structures.
Index Terms—neighbor center clustering, position and frequency weight, longest common subsequence, hybrid similarity calculation
Cite: Yong Piao, Xiu-kun Wang, "A Hybrid Method for XML Clustering by Structure and Content," Journal of Software vol. 6, no. 12, pp. 2361-2368, 2011.
General Information
ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Quarterly
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, EBSCO,
CNKI, Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etcE-mail: jsweditorialoffice@gmail.com
-
Oct 22, 2024 News!
Vol 19, No 3 has been published with online version [Click]
-
Jan 04, 2024 News!
JSW will adopt Article-by-Article Work Flow
-
Apr 01, 2024 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Apr 01, 2024 News!
Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
-
Jun 12, 2024 News!
Vol 19, No 2 has been published with online version [Click]