首页 > 专栏 > cs.CL updates on arXiv.org cs.CL updates on arXiv.org 共 489 条资讯 Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions 2026-06-28 03:07:22 CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models 2026-06-28 03:07:22 Rosetta: Composable Native Multimodal Pretraining 2026-06-28 03:07:22 Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models 2026-06-28 03:07:22 Testing Frontier Large Language Models' Physics Literacy in Parallel Physical Worlds 2026-06-28 03:07:22 The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters 2026-06-28 03:07:22 MetaHOPE: A Metaphor-Oriented Evaluation Framework for Analysing MT and LLM Translation Errors 2026-06-28 03:07:22 Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework 2026-06-28 03:07:22 What Survives Into Context: A Diagnostic for Budget-Constrained Multi-Hop RAG and When Submodular Evidence Packing Improves It 2026-06-28 03:07:22 MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark 2026-06-28 03:07:22 Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles 2026-06-28 03:07:22 Self-conditioned Flow Map Language Models via Fixed-point Flows 2026-06-28 03:07:22 YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese 2026-06-28 03:07:22 Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications 2026-06-28 03:07:22 Auditing Forgetting in Limited Memory Language Models 2026-06-28 03:07:22 From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape 2026-06-28 03:07:22 "Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo 2026-06-28 03:07:22 SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization 2026-06-28 03:07:22 Multi-Turn Agentic Scientific Literature Search via Workflow Induction 2026-06-28 03:07:22 Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs 2026-06-28 03:07:22 « 上一页1…56789…25下一页 » 相关分类 #!/slash/note #UNTAG (B)(F)uzzing on my world (Hi)story (IN)SECURE Magazine Notification (gdb) break *0x972 - 带鱼博客 BeltfishBlog - ./kwaa.dev .NET Blog .Trash /home/rook1e 00's Adventure 0kami's Blog 0x41414141 in ?? () 0x7f Blog 0xRick Owned Root ! 0xd00's blog 1 Byte 1A23 Blog 1A23 Studio 1Link.Fun 1stwebdesigner 251 2BAB 的工程博客 2ch中文网 360 CERT 360 Netlab Blog - Network Securi 38号车评中心 3o米的微博 404 Media