Quote Detection: a New Task and Dataset for Nlp

Tekir, S.; Güzel, A.; Tenekeci, S.; Haman, B.U.

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14206

Full metadata record

DC Field	Value	Language
dc.contributor.author	Tekir, S.	-
dc.contributor.author	Güzel, A.	-
dc.contributor.author	Tenekeci, S.	-
dc.contributor.author	Haman, B.U.	-
dc.date.accessioned	2024-01-06T07:22:37Z	-
dc.date.available	2024-01-06T07:22:37Z	-
dc.date.issued	2023	-
dc.identifier.isbn	9781959429548	-
dc.identifier.uri	https://hdl.handle.net/11147/14206	-
dc.description	7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793	en_US
dc.description.abstract	Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics.	en_US
dc.language.iso	en	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.relation.ispartof	EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Natural language processing systems	en_US
dc.subject	Auto-regressive	en_US
dc.subject	Extractive summarizations	en_US
dc.subject	Fine tuning	en_US
dc.subject	Gain insight	en_US
dc.subject	News summarization	en_US
dc.subject	Performance	en_US
dc.subject	Qualitative analysis	en_US
dc.subject	Random fields	en_US
dc.subject	Sequence models	en_US
dc.subject	Random processes	en_US
dc.title	Quote Detection: a New Task and Dataset for Nlp	en_US
dc.type	Conference Object	en_US
dc.institutionauthor	…	-
dc.department	İzmir Institute of Technology	en_US
dc.identifier.startpage	21	en_US
dc.identifier.endpage	27	en_US
dc.identifier.scopus	2-s2.0-85175428867	-
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.authorscopusid	16234844500	-
dc.authorscopusid	58675151700	-
dc.authorscopusid	57340107000	-
dc.authorscopusid	58675886200	-
dc.identifier.wosquality	N/A	-
dc.identifier.scopusquality	N/A	-
item.fulltext	No Fulltext	-
item.openairetype	Conference Object	-
item.cerifentitytype	Publications	-
item.grantfulltext	none	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
crisitem.author.dept	03.04. Department of Computer Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Show simple item record

CORE Recommender

Google Scholar^TM

Check

Google ScholarTM

Altmetric

Google Scholar^TM