ACL TA-DA: A Dataset for Text Summarization and Generation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Selecting appropriate natural language datasets is imperative to achieving good performance in deep learning natural language tasks. Recent state-of-the-art language models train huge corpora to achieving high language understanding performances. Also, to conduct diverse NLP tasks, fine-tuning pre-trained language models with task specific datasets is necessary. In this paper, we introduce ACL TA-DA (Association of Computational Linguistics Titles Abstracts DAta) consisting of 22k English titles and corresponding abstracts of papers published in ACL. Two NLP tasks, (1) text summarization and (2) text generation, are suitable tasks for our ACL TA-DA dataset. We train and report results from several state-of-the-art text summarization and generation models with our dataset to demonstrate that our dataset can be widely applied.

Original languageEnglish
Title of host publicationProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023
PublisherAssociation for Computing Machinery
Pages1233-1239
Number of pages7
ISBN (Electronic)9781450395175
DOIs
StatePublished - 27 Mar 2023
Event38th Annual ACM Symposium on Applied Computing, SAC 2023 - Tallinn, Estonia
Duration: 27 Mar 202331 Mar 2023

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference38th Annual ACM Symposium on Applied Computing, SAC 2023
Country/TerritoryEstonia
CityTallinn
Period27/03/2331/03/23

Keywords

  • data collection
  • natural language generation
  • text summarization

Fingerprint

Dive into the research topics of 'ACL TA-DA: A Dataset for Text Summarization and Generation'. Together they form a unique fingerprint.

Cite this