models ====== .. py:module:: topmost.models .. toctree:: :titlesonly: :maxdepth: 3 basic/index.rst crosslingual/index.rst dynamic/index.rst hierarchical/index.rst .. toctree:: :titlesonly: :maxdepth: 1 Encoder/index.rst Package Contents ---------------- .. autoapisummary:: topmost.models.ProdLDA topmost.models.CombinedTM topmost.models.DecTM topmost.models.ETM topmost.models.NSTM topmost.models.TSCTM topmost.models.ECRTM topmost.models.NMTM topmost.models.InfoCTM topmost.models.DETM topmost.models.CFDTM topmost.models.SawETM topmost.models.HyperMiner topmost.models.TraCo .. py:class:: ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` Autoencoding Variational Inference For Topic Models. ICLR 2017 Akash Srivastava, Charles Sutton. .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: fcd1 .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` .. py:attribute:: vocab_size .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc_contextual .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: fcd1 .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings. Xiaobao Wu, Chunping Li, Yishu Miao. .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: beta .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False) Bases: :py:obj:`torch.nn.Module` Topic Modeling in Embedding Spaces. TACL 2020 Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei. .. py:attribute:: topic_embeddings .. py:attribute:: encoder1 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: get_theta(x) .. py:method:: get_beta() .. py:method:: forward(x, avg_loss=True) .. py:method:: loss_function(x, recon_x, mu, logvar, avg_loss=True) .. py:class:: NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20) Bases: :py:obj:`torch.nn.Module` Neural Topic Model via Optimal Transport. ICLR 2021 He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine. .. py:attribute:: recon_loss_weight :value: 0.07 .. py:attribute:: sinkhorn_alpha :value: 20 .. py:attribute:: e1 .. py:attribute:: e2 .. py:attribute:: e_dropout .. py:attribute:: mean_bn .. py:attribute:: topic_embeddings .. py:method:: get_beta() .. py:method:: get_theta(input) .. py:method:: forward(input) .. py:class:: TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0) Bases: :py:obj:`torch.nn.Module` Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022 Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong. Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM. .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: mean_bn .. py:attribute:: decoder_bn .. py:attribute:: fcd1 .. py:attribute:: topic_dist_quant .. py:attribute:: contrast_loss .. py:method:: get_beta() .. py:method:: encode(inputs) .. py:method:: decode(theta) .. py:method:: get_theta(inputs) .. py:method:: forward(inputs) .. py:method:: loss_function(recon_x, x) .. py:class:: ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000) Bases: :py:obj:`torch.nn.Module` Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023 Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics :value: 50 .. py:attribute:: beta_temp :value: 0.2 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: fc1_dropout .. py:attribute:: theta_dropout .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: word_embeddings .. py:attribute:: topic_embeddings .. py:attribute:: ECR .. py:method:: get_beta() .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(input) .. py:method:: get_theta(input) .. py:method:: compute_loss_KL(mu, logvar) .. py:method:: get_loss_ECR() .. py:method:: pairwise_euclidean_distance(x, y) .. py:method:: forward(input) .. py:class:: NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8) Bases: :py:obj:`torch.nn.Module` Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020. Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao. .. py:attribute:: num_topics :value: 50 .. py:attribute:: lam :value: 0.8 .. py:attribute:: Map_en2cn .. py:attribute:: Map_cn2en .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn_en .. py:attribute:: decoder_bn_cn .. py:attribute:: fc11_en .. py:attribute:: fc11_cn .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: fc1_drop .. py:attribute:: z_drop .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: phi_en .. py:attribute:: phi_cn .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x, lang) .. py:method:: get_theta(x, lang) .. py:method:: get_beta() .. py:method:: decode(theta, lang) .. py:method:: forward(x_en, x_cn) .. py:method:: loss_function(recon_x, x, mu, logvar) .. py:class:: InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0) Bases: :py:obj:`torch.nn.Module` InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023 Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu .. py:attribute:: num_topics :value: 50 .. py:attribute:: encoder_en .. py:attribute:: encoder_cn .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn_en .. py:attribute:: decoder_bn_cn .. py:attribute:: phi_en .. py:attribute:: phi_cn .. py:attribute:: TAMI .. py:method:: get_beta() .. py:method:: get_theta(x, lang) .. py:method:: decode(theta, beta, lang) .. py:method:: forward(x_en, x_cn) .. py:method:: compute_loss_TM(recon_x, x, mu, logvar) .. py:class:: DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu') Bases: :py:obj:`torch.nn.Module` The Dynamic Embedded Topic Model. 2019 Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei .. py:attribute:: num_topics :value: 50 .. py:attribute:: num_times .. py:attribute:: vocab_size .. py:attribute:: eta_hidden_size :value: 200 .. py:attribute:: rho_size :value: 300 .. py:attribute:: enc_drop :value: 0.0 .. py:attribute:: eta_nlayers :value: 3 .. py:attribute:: t_drop .. py:attribute:: eta_dropout :value: 0.0 .. py:attribute:: delta :value: 0.005 .. py:attribute:: train_WE :value: True .. py:attribute:: train_size .. py:attribute:: rnn_inp .. py:attribute:: device :value: 'cpu' .. py:attribute:: theta_act :value: 'relu' .. py:attribute:: mu_q_alpha .. py:attribute:: logsigma_q_alpha .. py:attribute:: q_theta .. py:attribute:: mu_q_theta .. py:attribute:: logsigma_q_theta .. py:attribute:: q_eta_map .. py:attribute:: q_eta .. py:attribute:: mu_q_eta .. py:attribute:: logsigma_q_eta .. py:attribute:: decoder_bn .. py:method:: get_activation(act) .. py:method:: reparameterize(mu, logvar) Returns a sample from a Gaussian distribution via reparameterization. .. py:method:: get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None) Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ). .. py:method:: get_alpha() .. py:method:: get_eta(rnn_inp) .. py:method:: get_theta(bows, times, eta=None) Returns the topic proportions. .. py:property:: word_embeddings .. py:property:: topic_embeddings .. py:method:: get_beta(alpha=None) Returns the topic matrix eta of shape T x K x V .. py:method:: get_NLL(theta, beta, bows) .. py:method:: forward(bows, times) .. py:method:: init_hidden() Initializes the first hidden state of the RNN used as inference network for \eta. .. py:class:: CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200) Bases: :py:obj:`torch.nn.Module` Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics :value: 50 .. py:attribute:: beta_temp :value: 1.0 .. py:attribute:: train_time_wordfreq .. py:attribute:: encoder .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn .. py:attribute:: topic_embeddings .. py:attribute:: ETC .. py:attribute:: UWE .. py:method:: get_beta() .. py:method:: pairwise_euclidean_dist(x, y) .. py:method:: get_theta(x, times=None) .. py:method:: get_KL(mu, logvar) .. py:method:: get_NLL(theta, beta, x, recon_x=None) .. py:method:: decode(theta, beta) .. py:method:: forward(x, times) .. py:class:: SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None) Bases: :py:obj:`torch.nn.Module` Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021. Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou. https://github.com/ZhibinDuan/SawETM .. py:attribute:: device :value: 'cpu' .. py:attribute:: gam_prior .. py:attribute:: real_min .. py:attribute:: theta_max .. py:attribute:: wei_shape_min .. py:attribute:: wei_shape_max .. py:attribute:: num_topics_list .. py:attribute:: num_hiddens_list .. py:attribute:: num_layers .. py:attribute:: alpha .. py:attribute:: h_encoder .. py:attribute:: q_theta .. py:method:: log_max(x) .. py:method:: reparameterize(shape, scale, sample_num=50) Returns a sample from a Weibull distribution via reparameterization. .. py:method:: kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale) Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution. .. py:method:: get_nll(x, x_reconstruct) Returns the negative Poisson likelihood of observational count data. .. py:property:: bottom_word_embeddings .. py:property:: topic_embeddings_list .. py:method:: get_phis() Returns the factor loading matrix by utilizing sawtooth connection. .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(x) .. py:method:: forward(x) Forward pass: compute the kl loss and data likelihood. .. py:class:: HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None) Bases: :py:obj:`topmost.models.hierarchical.SawETM.SawETM.SawETM` HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022. Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou. https://github.com/NoviceStone/HyperMiner .. py:attribute:: manifold .. py:attribute:: clip_r :value: None .. py:method:: feat_clip(x) .. py:property:: bottom_word_embeddings .. py:property:: topic_embeddings_list .. py:method:: get_phi() Returns the factor loading matrix by utilizing sawtooth connection. .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(x) .. py:method:: forward(x) Forward pass: compute the kl loss and data likelihood. .. py:class:: TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000) Bases: :py:obj:`torch.nn.Module` On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024 Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics_list :value: [10, 50, 200] .. py:attribute:: weight_loss_TPD :value: 20.0 .. py:attribute:: beta_temp :value: 0.1 .. py:attribute:: num_layers .. py:attribute:: bottom_word_embeddings .. py:attribute:: topic_embeddings_list .. py:attribute:: TPD .. py:attribute:: CDDecoder .. py:attribute:: encoder .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(input_bow) .. py:method:: forward(input_bow) .. py:method:: compute_loss_KL(mu, logvar, mu_prior=None)