PanguIR Technical Report for NTCIR-18 AEOLLM Task

Lang Mei; Chong Chen; Jiaxin Mao

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

PanguIR Technical Report for NTCIR-18 AEOLLM Task

https://doi.org/10.20736/0002002029

名前 / ファイル	ライセンス	アクション
05-NTCIR18-AEOLLM-MeiL.pdf (785.6 KB)

アイテムタイプ

デフォルトアイテムタイプ（フル）(1)

公開日

2025-06-06

タイトル

PanguIR Technical Report for NTCIR-18 AEOLLM Task

言語

作成者

Lang Mei
Chong Chen
Jiaxin Mao

内容記述

内容記述タイプ

Abstract

内容記述

As large language models (LLMs) gain widespread attention in both academia and industry, it becomes increasingly critical and challenging to effectively evaluate their capabilities. Existing evaluation methods can be broadly categorized into two types: manual evaluation and automatic evaluation. Manual evaluation, while comprehensive, is often costly and resource-intensive. Conversely, automatic evaluation offers greater scalability but is constrained by the limitations of its evaluation criteria (dominated by reference-based answers). To address these challenges, NTCIR-18\footnote{https://research.nii.ac.jp/ntcir/ntcir-18/tasks.html#AEOLLM} introduced the AEOLLM (Automatic Evaluation of LLMs) task, aiming to encourage reference-free evaluation methods that can overcome the limitations of existing approaches. In this paper, to enhance the evaluation performance of the AEOLLM task, we propose three key methods to improve the reference-free evaluation: 1) Multi-model Collaboration: Leveraging multiple LLMs to approximate human ratings across various subtasks; 2) Prompt Auto-optimization: Utilizing LLMs to iteratively refine the initial task prompts based on evaluation feedback from training samples; and 3) In-context Learning (ICL) Optimization: Based on the multi-task evaluation feedback, we train a specialized in-context example retrieval model, combined with a semantic relevance retrieval model, to jointly identify the most effective in-context learning examples. Experiments conducted on the final dataset demonstrate that our approach achieves superior performance on the AEOLLM task.

言語

出版者

NII Institutional Repository

言語

日付

2025-06-06

日付タイプ

Issued

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.20736/0002002029

ID登録タイプ

JaLC

Versions

Ver.1

2025-06-04 08:00:50.147343

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

PanguIR Technical Report for NTCIR-18 AEOLLM Task

× Lang Mei

× Chong Chen

× Jiaxin Mao

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

PanguIR Technical Report for NTCIR-18 AEOLLM Task

× Lang Mei

× Chong Chen

× Jiaxin Mao

Versions

Share

Cite as

Other

エクスポート

コミュニティ