models
======

.. py:module:: topmost.models


.. toctree::
   :titlesonly:
   :maxdepth: 3


   basic/index.rst


   crosslingual/index.rst


   dynamic/index.rst


   hierarchical/index.rst


.. toctree::
   :titlesonly:
   :maxdepth: 1


   Encoder/index.rst


Package Contents
----------------


.. autoapisummary::


   topmost.models.ProdLDA


   topmost.models.CombinedTM


   topmost.models.DecTM


   topmost.models.ETM


   topmost.models.NSTM


   topmost.models.TSCTM


   topmost.models.ECRTM


   topmost.models.NMTM


   topmost.models.InfoCTM


   topmost.models.DETM


   topmost.models.CFDTM


   topmost.models.SawETM


   topmost.models.HyperMiner


   topmost.models.TraCo


.. py:class:: ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)

   Bases: :py:obj:`torch.nn.Module`


   Autoencoding Variational Inference For Topic Models. ICLR 2017

   Akash Srivastava, Charles Sutton.


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: fc11


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:attribute:: mean_bn


   .. py:attribute:: logvar_bn


   .. py:attribute:: decoder_bn


   .. py:attribute:: fc1_drop


   .. py:attribute:: theta_drop


   .. py:attribute:: fcd1


   .. py:method:: get_beta()


   .. py:method:: get_theta(x)


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(x)


   .. py:method:: decode(theta)


   .. py:method:: forward(x)


   .. py:method:: loss_function(x, recon_x, mu, logvar)


.. py:class:: CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)

   Bases: :py:obj:`torch.nn.Module`


   .. py:attribute:: vocab_size


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: fc_contextual


   .. py:attribute:: fc11


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:attribute:: mean_bn


   .. py:attribute:: logvar_bn


   .. py:attribute:: decoder_bn


   .. py:attribute:: fc1_drop


   .. py:attribute:: theta_drop


   .. py:attribute:: fcd1


   .. py:method:: get_beta()


   .. py:method:: get_theta(x)


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(x)


   .. py:method:: decode(theta)


   .. py:method:: forward(x)


   .. py:method:: loss_function(x, recon_x, mu, logvar)


.. py:class:: DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)

   Bases: :py:obj:`torch.nn.Module`


   Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

   Xiaobao Wu, Chunping Li, Yishu Miao.


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: fc11


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:attribute:: mean_bn


   .. py:attribute:: logvar_bn


   .. py:attribute:: decoder_bn


   .. py:attribute:: fc1_drop


   .. py:attribute:: theta_drop


   .. py:attribute:: beta


   .. py:method:: get_beta()


   .. py:method:: get_theta(x)


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(x)


   .. py:method:: decode(theta)


   .. py:method:: forward(x)


   .. py:method:: loss_function(x, recon_x, mu, logvar)


.. py:class:: ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)

   Bases: :py:obj:`torch.nn.Module`


   Topic Modeling in Embedding Spaces. TACL 2020

   Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.


   .. py:attribute:: topic_embeddings


   .. py:attribute:: encoder1


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(x)


   .. py:method:: get_theta(x)


   .. py:method:: get_beta()


   .. py:method:: forward(x, avg_loss=True)


   .. py:method:: loss_function(x, recon_x, mu, logvar, avg_loss=True)


.. py:class:: NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)

   Bases: :py:obj:`torch.nn.Module`


   Neural Topic Model via Optimal Transport. ICLR 2021

   He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.


   .. py:attribute:: recon_loss_weight
      :value: 0.07


   .. py:attribute:: sinkhorn_alpha
      :value: 20


   .. py:attribute:: e1


   .. py:attribute:: e2


   .. py:attribute:: e_dropout


   .. py:attribute:: mean_bn


   .. py:attribute:: topic_embeddings


   .. py:method:: get_beta()


   .. py:method:: get_theta(input)


   .. py:method:: forward(input)


.. py:class:: TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)

   Bases: :py:obj:`torch.nn.Module`


   Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

   Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.

   Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.


   .. py:attribute:: fc11


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: mean_bn


   .. py:attribute:: decoder_bn


   .. py:attribute:: fcd1


   .. py:attribute:: topic_dist_quant


   .. py:attribute:: contrast_loss


   .. py:method:: get_beta()


   .. py:method:: encode(inputs)


   .. py:method:: decode(theta)


   .. py:method:: get_theta(inputs)


   .. py:method:: forward(inputs)


   .. py:method:: loss_function(recon_x, x)


.. py:class:: ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)

   Bases: :py:obj:`torch.nn.Module`


   Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

   Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: beta_temp
      :value: 0.2


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: fc11


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:attribute:: fc1_dropout


   .. py:attribute:: theta_dropout


   .. py:attribute:: mean_bn


   .. py:attribute:: logvar_bn


   .. py:attribute:: decoder_bn


   .. py:attribute:: word_embeddings


   .. py:attribute:: topic_embeddings


   .. py:attribute:: ECR


   .. py:method:: get_beta()


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(input)


   .. py:method:: get_theta(input)


   .. py:method:: compute_loss_KL(mu, logvar)


   .. py:method:: get_loss_ECR()


   .. py:method:: pairwise_euclidean_distance(x, y)


   .. py:method:: forward(input)


.. py:class:: NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)

   Bases: :py:obj:`torch.nn.Module`


   Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

   Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: lam
      :value: 0.8


   .. py:attribute:: Map_en2cn


   .. py:attribute:: Map_cn2en


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: decoder_bn_en


   .. py:attribute:: decoder_bn_cn


   .. py:attribute:: fc11_en


   .. py:attribute:: fc11_cn


   .. py:attribute:: fc12


   .. py:attribute:: fc21


   .. py:attribute:: fc22


   .. py:attribute:: fc1_drop


   .. py:attribute:: z_drop


   .. py:attribute:: mean_bn


   .. py:attribute:: logvar_bn


   .. py:attribute:: phi_en


   .. py:attribute:: phi_cn


   .. py:method:: reparameterize(mu, logvar)


   .. py:method:: encode(x, lang)


   .. py:method:: get_theta(x, lang)


   .. py:method:: get_beta()


   .. py:method:: decode(theta, lang)


   .. py:method:: forward(x_en, x_cn)


   .. py:method:: loss_function(recon_x, x, mu, logvar)


.. py:class:: InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)

   Bases: :py:obj:`torch.nn.Module`


   InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

   Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: encoder_en


   .. py:attribute:: encoder_cn


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: decoder_bn_en


   .. py:attribute:: decoder_bn_cn


   .. py:attribute:: phi_en


   .. py:attribute:: phi_cn


   .. py:attribute:: TAMI


   .. py:method:: get_beta()


   .. py:method:: get_theta(x, lang)


   .. py:method:: decode(theta, beta, lang)


   .. py:method:: forward(x_en, x_cn)


   .. py:method:: compute_loss_TM(recon_x, x, mu, logvar)


.. py:class:: DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')

   Bases: :py:obj:`torch.nn.Module`


   The Dynamic Embedded Topic Model. 2019

   Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: num_times


   .. py:attribute:: vocab_size


   .. py:attribute:: eta_hidden_size
      :value: 200


   .. py:attribute:: rho_size
      :value: 300


   .. py:attribute:: enc_drop
      :value: 0.0


   .. py:attribute:: eta_nlayers
      :value: 3


   .. py:attribute:: t_drop


   .. py:attribute:: eta_dropout
      :value: 0.0


   .. py:attribute:: delta
      :value: 0.005


   .. py:attribute:: train_WE
      :value: True


   .. py:attribute:: train_size


   .. py:attribute:: rnn_inp


   .. py:attribute:: device
      :value: 'cpu'


   .. py:attribute:: theta_act
      :value: 'relu'


   .. py:attribute:: mu_q_alpha


   .. py:attribute:: logsigma_q_alpha


   .. py:attribute:: q_theta


   .. py:attribute:: mu_q_theta


   .. py:attribute:: logsigma_q_theta


   .. py:attribute:: q_eta_map


   .. py:attribute:: q_eta


   .. py:attribute:: mu_q_eta


   .. py:attribute:: logsigma_q_eta


   .. py:attribute:: decoder_bn


   .. py:method:: get_activation(act)


   .. py:method:: reparameterize(mu, logvar)

      Returns a sample from a Gaussian distribution via reparameterization.


   .. py:method:: get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)

      Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).


   .. py:method:: get_alpha()


   .. py:method:: get_eta(rnn_inp)


   .. py:method:: get_theta(bows, times, eta=None)

      Returns the topic proportions.


   .. py:property:: word_embeddings


   .. py:property:: topic_embeddings


   .. py:method:: get_beta(alpha=None)

      Returns the topic matrix eta of shape T x K x V


   .. py:method:: get_NLL(theta, beta, bows)


   .. py:method:: forward(bows, times)


   .. py:method:: init_hidden()

      Initializes the first hidden state of the RNN used as inference network for \eta.


.. py:class:: CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200)

   Bases: :py:obj:`torch.nn.Module`


   Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings

   Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu.


   .. py:attribute:: num_topics
      :value: 50


   .. py:attribute:: beta_temp
      :value: 1.0


   .. py:attribute:: train_time_wordfreq


   .. py:attribute:: encoder


   .. py:attribute:: a


   .. py:attribute:: mu2


   .. py:attribute:: var2


   .. py:attribute:: decoder_bn


   .. py:attribute:: topic_embeddings


   .. py:attribute:: ETC


   .. py:attribute:: UWE


   .. py:method:: get_beta()


   .. py:method:: pairwise_euclidean_dist(x, y)


   .. py:method:: get_theta(x, times=None)


   .. py:method:: get_KL(mu, logvar)


   .. py:method:: get_NLL(theta, beta, x, recon_x=None)


   .. py:method:: decode(theta, beta)


   .. py:method:: forward(x, times)


.. py:class:: SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)

   Bases: :py:obj:`torch.nn.Module`


   Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

   Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.

   https://github.com/ZhibinDuan/SawETM


   .. py:attribute:: device
      :value: 'cpu'


   .. py:attribute:: gam_prior


   .. py:attribute:: real_min


   .. py:attribute:: theta_max


   .. py:attribute:: wei_shape_min


   .. py:attribute:: wei_shape_max


   .. py:attribute:: num_topics_list


   .. py:attribute:: num_hiddens_list


   .. py:attribute:: num_layers


   .. py:attribute:: alpha


   .. py:attribute:: h_encoder


   .. py:attribute:: q_theta


   .. py:method:: log_max(x)


   .. py:method:: reparameterize(shape, scale, sample_num=50)

      Returns a sample from a Weibull distribution via reparameterization.


   .. py:method:: kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)

      Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.


   .. py:method:: get_nll(x, x_reconstruct)

      Returns the negative Poisson likelihood of observational count data.


   .. py:property:: bottom_word_embeddings


   .. py:property:: topic_embeddings_list


   .. py:method:: get_phis()

      Returns the factor loading matrix by utilizing sawtooth connection.


   .. py:method:: get_beta()


   .. py:method:: get_phi_list()


   .. py:method:: get_theta(x)


   .. py:method:: forward(x)

      Forward pass: compute the kl loss and data likelihood.


.. py:class:: HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)

   Bases: :py:obj:`topmost.models.hierarchical.SawETM.SawETM.SawETM`


   HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

   Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.

   https://github.com/NoviceStone/HyperMiner


   .. py:attribute:: manifold


   .. py:attribute:: clip_r
      :value: None


   .. py:method:: feat_clip(x)


   .. py:property:: bottom_word_embeddings


   .. py:property:: topic_embeddings_list


   .. py:method:: get_phi()

      Returns the factor loading matrix by utilizing sawtooth connection.


   .. py:method:: get_beta()


   .. py:method:: get_phi_list()


   .. py:method:: get_theta(x)


   .. py:method:: forward(x)

      Forward pass: compute the kl loss and data likelihood.


.. py:class:: TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)

   Bases: :py:obj:`torch.nn.Module`


   On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

   Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.


   .. py:attribute:: num_topics_list
      :value: [10, 50, 200]


   .. py:attribute:: weight_loss_TPD
      :value: 20.0


   .. py:attribute:: beta_temp
      :value: 0.1


   .. py:attribute:: num_layers


   .. py:attribute:: bottom_word_embeddings


   .. py:attribute:: topic_embeddings_list


   .. py:attribute:: TPD


   .. py:attribute:: CDDecoder


   .. py:attribute:: encoder


   .. py:method:: get_beta()


   .. py:method:: get_phi_list()


   .. py:method:: get_theta(input_bow)


   .. py:method:: forward(input_bow)


   .. py:method:: compute_loss_KL(mu, logvar, mu_prior=None)