【英文论文】A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings-数字教育湖北省重点实验室

【英文论文】A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings

时间：2026年03月11日点击数：

作者：张思*,徐佳丽,惠柠,翟佩云

出版刊物：Neurocomputing

出版时间：2025年

内容摘要：

The development of neural networks has provided a flexible learning framework for topic modeling. Currently,topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling.

To alleviate this issue, an innovative model called Topic Modeling of Enhanced Neural Network with word Embedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.

神经网络的发展为主题建模提供了灵活的学习框架。当前，基于神经网络的主题建模方法已受到广泛关注。然而，由于短文本的复杂性，现有神经主题建模的实现效果仍有待提升。短文本通常仅包含少量词语和有限的特征信息，缺乏足够的词共现和上下文共享信息，导致主题建模面临特征稀疏和可解释性差等挑战。

为缓解这一问题，本文提出了一种创新的模型——融合词嵌入的增强神经网络主题建模模型。首先，我们在推断网络部分引入了增强网络，通过融合高斯编码网络与逻辑编码网络，以提高主题提取的性能与可解释性。其次，我们将预训练词嵌入引入模型的高斯解码网络部分，以丰富上下文语义信息。我们在三个公开数据集（20NewsGroups、AG_news 和 TagMyNews）上进行了综合实验。结果表明，本文提出的方法在主题提取与文本分类任务上均优于多种先进模型。