刘鹏远

Email: liupengyuan [at] pku.edu.cn

简介

北京语言大学信息科学学院，研究员，博士生导师。博士毕业于哈尔滨工业大学计算机科学与技术学院，师从赵铁军/李生教授，曾在北京大学计算语言学研究所做博后，合作导师俞士汶教授。主要研究方向为人工智能（安全可信）、大语言模型、自然语言处理与社会计算等。主持国家自然科学基金，北京市自然科学基金，教育部人文社科规划基金，语委重点等多个基金，在ACL/EMNLP/COLING等CCF A/B以及中文核心期刊发表论文40余篇。目前现为中国计算机学会自然语言处理委员会执行委员，中国中文信息学会大模型与生成专委会委员、中国中文信息学会社会计算委员会委员、中国民族语言学会语言资源与计算人文专业委员会委员。担任《计算机学报》、《自动化学报》、《中文信息学报》及《计算机工程与应用》等期刊的审稿专家。
研究团队（NLU&SoCo实验室）目前共有指导教师4名，博士生2名，硕士生30余名，本科生20余名。实验室长期招收本科生，欢迎对自然语言处理及社会计算感兴趣的同学发送邮件联系，并附简历。

近期团队荣誉：

2024年，第二十三届中国计算语言学大会亮点中文论文；

2024年，第28届亚洲语言处理国际会议最佳论文奖；

2024年，“巢燧杯”全国人工智能大模型创新发展大赛水利垂直赛道第一名；

2021年，在Nature子刊《Humanities and Social Sciences Communications》发表论文；

2021年，在国际语义评测Semeval 2021上，取得Task10单项赛道第一名与第三名；

2021年，在Shared Tasks in NLPCC 2021上，取得Task 1单项赛道第一名；

2020年，第十九届中国计算语言学大会最佳海报论文奖；

2020年，第十三届中国大学生设计大赛二等奖（国家级）；

2020年，第十三届中国大学生设计大赛一等奖（北京市级）；

2018年，第十七届中国计算语言学大会最佳论文奖；

近期发表论文

Xia Du, Shuhan Sun, Pengyuan Liu, and Dong Yu. 2025. Investigating Value-Reasoning Reliability in Small Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025).
Feiyu Wang, Ziran Zhao, Dong Yu, and Pengyuan Liu. 2025. Attribution and Application of Multiple Neurons in Multimodal Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025.
王雯, 于东, 刘鹏远. 2025. 基于领域信息分解式学习的大语言模型修辞认知增强方法. 《中文信息学报》
Liu, Xuelin, Pengyuan Liu, and Dong Yu. 2025. What's the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025).
Yang, Zhiyu, et al. 2024. Matplotagent: Method and evaluation for llm-based agentic scientific data visualization. Findings of the Association for Computational Linguistics (ACL 2024).
Xuelin Liu, Yanfei Zhu, Shucheng Zhu, Pengyuan Liu, Ying Liu, and Dong Yu. 2024. Evaluating Moral Beliefs across LLMs through a Pluralistic Framework. Findings of the Association for Computational Linguistics: EMNLP 2024.
杜霞, 刘鹏远, 于东. 2024. 中西方谚语多元价值观资源库建设及对比研究. 《数字人文》
张旭, 郭梦清, 朱述承, 等. 2024. 大语言模型开放性生成文本中的职业性别偏见研究. 《数字人文》
Feiyu Wang, Wenyu Guo, Dong Yu, Chen Kang, and Pengyuan Liu. 2024. Bridging the Gap between Authentic and Answer-Guided Images for Chinese Vision-Language Understanding Enhancement. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Du, Huidong, et al. 2024. Generate-then-Revise: An Effective Synthetic Training Data Generation Framework for Event Detection. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Yang, Zhiyu, et al. 2024. Enhancing Free-Form Table Question Answering Models by Distilling Relevant-Cell-Based Rationales. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Wen Wang, Siyi Tang, Dong Yu, and Pengyuan Liu. 2024. 人类思维指导下大小模型协同决策的中文修辞识别与理解方法. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Hongying, Huo, Huang Shaoping, and Liu Pengyuan. 2024. 基于关系抽取的中文意合图语义解析方法研究. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Xiaomeng Du, Dong Yu, and Pengyuan Liu. 2024. 文本样式和主题框架引导下的大模型辅助儿童新闻生成. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Wen Wang, Dong Yu, and Pengyuan Liu. 2024. 基于领域信息分解式学习的大语言模型修辞认知增强方法. Proceedings of the 23rd Chinese National Conference on Computational Linguistics (CCL 2024).
Yu Shan, Shan Yu, Pengyuan Liu, Hua Liu, and Kaiyi Chen. 2024. A Multi-Concept Semantic Representation System for Chinese Intent Recognition. International Conference on Asian Language Processing (IALP 2024).
Sun Hao, Shan Yu, Pengyuan Liu, Hua Liu, and Kaiyi Chen. 2024. Construction and Analysis of a Diachronic Knowledge Base of Traditional Values in Chinese Idioms. International Conference on Asian Language Processing (IALP 2024).
孙浩, 杜惠东, 刘鹏远, 于东. 2024. 中国成语传统价值观资源库构建及历时分析. 第25届汉语词汇语义学研讨会 (CLSW 2024).
黄莎淇, 于东, 王弘睿, 刘鹏远, 刘佳琪. 2024. 汉语副词道德倾向性与道德类型计量研究. 第25届汉语词汇语义学研讨会 (CLSW 2024).
于东, 王弘睿, 刘鹏远, 张宇飞. 2024. 基于多版本语文教材教学位序的汉字难度表征方法研究. 第25届汉语词汇语义学研讨会 (CLSW 2024).
王弘睿, 于东, 曾立英, 刘鹏远. 2024. 词汇使用视角下70年中国社会道德内涵变迁计量研究. 第25届汉语词汇语义学研讨会 (CLSW 2024).
Jiadai Sun, Siyi Tang, Shike Wang, Dong Yu, and Pengyuan Liu. 2023. 大规模语言模型增强的中文篇章多维度阅读体验量化研究. Proceedings of the 22nd Chinese National Conference on Computational Linguistics (CCL 2023).
Yingshi Chen, Dong Yu, and Pengyuan Liu. 2023. 动词视角下的汉语性别表征研究——基于多语体语料库与依存分析. Proceedings of the 22nd Chinese National Conference on Computational Linguistics (CCL 2023).
Hongrui Wang, Dong Yu, Pengyuan Liu, and Liying Ceng. 2023. 中国社会道德变化模型与发展动因探究——基于70年《人民日报》的计量与分析. Proceedings of the 22nd Chinese National Conference on Computational Linguistics (CCL 2023).
Wang, Y., Zhong, Y., Zhang, X., Niu, C., Yu, D., Liu, P. 2023. CCD-ASQP: A Chinese Cross-Domain Aspect Sentiment Quadruple Prediction Dataset. Knowledge Graph Empowers Artificial General Intelligence (CCKS 2023).
Wang, Y., Du, X., Liu, P., Yu, D. 2023. Moral Essential Elements: MEE-A Dataset for Moral Judgement. Knowledge Graph Empowers Artificial General Intelligence (CCKS 2023).
Pengyuan Liu, Sanle Zhang, Dong Yu, and Lin Bo. 2022. CoreValue:面向价值观计算的中文核心价值-行为体系及知识库. Proceedings of the 21st Chinese National Conference on Computational Linguistics (CCL 2022).
Chunxu Zhao, Pengyuan Liu, and Dong Yu. 2022. From Polarity to Intensity: Mining Morality from Semantic Space. Proceedings of the 29th International Conference on Computational Linguistics (CCL 2022).
Mengqing Guo, Jiali Li, Jishun Zhao, Shucheng Zhu, Ying Liu, and Pengyuan Liu. 2022. 中文自然语言处理多任务中的职业性别偏见测量. Proceedings of the 21st Chinese National Conference on Computational Linguistics (CCL 2022).
Yi Li, Dong Yu, and Pengyuan Liu. 2022. CLGC: A Corpus for Chinese Literary Grace Evaluation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC 2022).
Qi Su, Pengyuan Liu, Wei Wei, Shucheng Zhu & Chu-Ren Huang. 2021. Occupational gender segregation and gendered language in a language without gender: trends, variations, implications for social development in China. humanities and social sciences communications (www.nature.com/articles/s41599-021-00799-6).
刘鹏远, 田永胜, 杜成玉, 邱立坤. 2021. 多目标情感分类中文数据集构建及分析研究. 《中文信息学报》
刘鹏远, 王伟康, 邱立坤, 杜冰洁. 2021. CDCPP：跨领域中文标点符号预测. 《中文信息学报》
朱述承, 苏祺, 刘鹏远. 2021. 基于语料库的我国职业性别无意识偏见共时历时研究. 《中文信息学报》
邢百西, 刘鹏远. 2021. 中文关系抽取的句级语义特征探究. Proceedings of Chinese Computational Linguistics (CCL 2021).
赵继舜, 杜冰洁, 朱述承, 刘鹏远. 2021. 句子级性别无偏中文数据集构建与评价. Proceedings of Chinese Computational Linguistics (CCL 2021).

教授课程

算法导论
数据结构

人才培养

名下在读成员

杜霞 (2023-): 硕士生
董晴 (2023-): 硕士生
苏彬娴 (2023-): 硕士生
刘洋洋 (2024-): 硕士生
马芷晴 (2024-): 硕士生
张方橼 (2024-): 硕士生
张景涵 (2024-): 硕士生
赵子然 (2024-): 硕士生
陈曲 (2025-): 博士生
安雅妮 (2025-): 硕士生
朱芷煜 (2025-): 硕士生
杨士辉 (2025-): 硕士生
王一萍 (2025-): 硕士生
张晨熹 (2025-): 硕士生
余昊宸 (2025-): 硕士生

名下毕业成员

孙亚运 (2013-2016): 公务员（北京市通州区）
李琳 (2013-2016): 博士，浙江外国语学院中国语言文化学院
张云翠 (2013-2016): 中信银行
丁嘉 (2014-2017): 百融云创
卢涌 (2014-2017): 公务员（北京市东城区）
张欣园 (2015-2018): 地平线机器人
竺成浩 (2016-2019): 博士，江西科技师范大学
王永冠 (2016-2019): 快手
邓宇宁 (2016-2019): 微软中国有限公司
郑志军 (2016-2019): 智谱华章
刘晓 (2017-2020): 天翼安全科技有限公司
杜成玉 (2017-2020): 博士拟录取（湖南大学）
卢梦依 (2017-2020): 博士在读（University of Vermont）
朱述承 (2017-2020): 博士，中国人民大学
张颖 (2018-2021): 公务员
刘玉洁 (2018-2021): 海天瑞声
赵硕丰 (2018-2021): 作业帮
王伟康 (2018-2021): 博士，中国商飞（上海）
潘月 (2018-2021): 网易有道
毕洪亮 (2018-2021): 新浪微博
胡晗 (2018-2021): 新浪微博
张三乐 (2019-2022): 小学教师（成都市）
田永胜 (2019-2022): 拼多多
杜冰洁 (2019-2022): 珠海科技学院
张虎 (2019-2022): 中国电信股份有限公司（成都）
邢百西 (2019-2022): 公务员
李加厉 (2020-2023): 公务员（北京市海淀区）
顾璐璐 (2020-2023): 中学教师（嘉兴）
胡梦旸 (2020-2023): 成都成华区委党校
吴艺 (2020-2023): 蜜度科技股份有限公司
赵继舜 (2020-2023): 公务员（大连市）
钟原 (2020-2023): 中学教师（北京海淀区）
杨致宇 (2021-): 博士在读（The University of Texas at Dallas）
薄琳 (2021-2024): 字节跳动
张绪朋 (2021-2024): 软通动力集团（北京）
黄文源 (2021-2024): 滴普科技股份有限公司（深圳）
郭梦清 (2021-2024): GAP
孙浩 (2022-2025): 中学教师（北京顺义区）
陈颖诗 (2022-2025): 中学教师（广州）
刘雪林 (2022-2025): 快手

更新于2026年3月28日。