当前榜中榜

资讯板块落地页

首页 > 专栏 > cs.CL updates on arXiv.org

cs.CL updates on arXiv.org

共 489 条资讯

Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions

2026-06-28 03:07:22
CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models

2026-06-28 03:07:22
Rosetta: Composable Native Multimodal Pretraining

2026-06-28 03:07:22
Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

2026-06-28 03:07:22
Testing Frontier Large Language Models' Physics Literacy in Parallel Physical Worlds

2026-06-28 03:07:22
The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters

2026-06-28 03:07:22
MetaHOPE: A Metaphor-Oriented Evaluation Framework for Analysing MT and LLM Translation Errors

2026-06-28 03:07:22
Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

2026-06-28 03:07:22
What Survives Into Context: A Diagnostic for Budget-Constrained Multi-Hop RAG and When Submodular Evidence Packing Improves It

2026-06-28 03:07:22
MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark

2026-06-28 03:07:22
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

2026-06-28 03:07:22
Self-conditioned Flow Map Language Models via Fixed-point Flows

2026-06-28 03:07:22
YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

2026-06-28 03:07:22
Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

2026-06-28 03:07:22
Auditing Forgetting in Limited Memory Language Models

2026-06-28 03:07:22
From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape

2026-06-28 03:07:22
"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo

2026-06-28 03:07:22
SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization

2026-06-28 03:07:22
Multi-Turn Agentic Scientific Literature Search via Workflow Induction

2026-06-28 03:07:22
Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

2026-06-28 03:07:22

Powered by WordPress | WP Newspaper by WP Mag Plus