Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/5014
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTekir, Selma-
dc.contributor.authorMansmann, Florian-
dc.contributor.authorKeim, Daniel-
dc.date.accessioned2017-03-09T06:53:15Z-
dc.date.available2017-03-09T06:53:15Z-
dc.date.issued2011-
dc.identifier.citationTekir, S., Mansmann, F., and Keim, D. (2011, April 11-15). Geodesic distances for web document clustering. Paper presented at the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011. doi:10.1109/CIDM.2011.5949449en_US
dc.identifier.isbn9781424499274-
dc.identifier.urihttp://doi.org/10.1109/CIDM.2011.5949449-
dc.identifier.urihttp://hdl.handle.net/11147/5014-
dc.descriptionSymposium Series on Computational Intelligence, IEEE SSCI2011 - 2011 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011; Paris; France; 11 April 2011 through 15 April 2011en_US
dc.description.abstractWhile traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article. © 2011 IEEE.en_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartofIEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011en_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCluster analysisen_US
dc.subjectGeodesic distancesen_US
dc.subjectWikipediaen_US
dc.subjectUser interfacesen_US
dc.subjectWeb document clusteringen_US
dc.titleGeodesic distances for web document clusteringen_US
dc.typeConference Objecten_US
dc.authoridTR114496en_US
dc.institutionauthorTekir, Selma-
dc.departmentİzmir Institute of Technology. Computer Engineeringen_US
dc.identifier.startpage15en_US
dc.identifier.endpage21en_US
dc.identifier.scopus2-s2.0-79961193406en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.identifier.doi10.1109/CIDM.2011.5949449-
dc.relation.doi10.1109/CIDM.2011.5949449en_US
dc.coverage.doi10.1109/CIDM.2011.5949449en_US
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.fulltextWith Fulltext-
item.languageiso639-1en-
item.grantfulltextopen-
item.openairetypeConference Object-
crisitem.author.dept03.04. Department of Computer Engineering-
Appears in Collections:Computer Engineering / Bilgisayar Mühendisliği
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Files in This Item:
File Description SizeFormat 
5014.pdfConference Paper143.33 kBAdobe PDFThumbnail
View/Open
Show simple item record



CORE Recommender

SCOPUSTM   
Citations

6
checked on Apr 5, 2024

Page view(s)

154
checked on Apr 15, 2024

Download(s)

246
checked on Apr 15, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.