models#

basic
- ECRTM
  - ECR
  - ECRTM
- NSTM
  - NSTM
  - auto_diff_sinkhorn
- TSCTM
  - TSC
  - TSCTM
  - TopicDistQuant
- CombinedTM
- DecTM
- ETM
- ProdLDA
crosslingual
- InfoCTM
  - InfoCTM
  - TAMI
- NMTM
dynamic
- DETM
hierarchical
- HyperMiner
  - manifolds
  - HyperMiner
- SawETM
  - SawETM
  - block
- TraCo
  - CDDecoder
  - TPD
  - TraCo
  - utils
- ProGBN

Encoder

Package Contents#

`ProdLDA`	Autoencoding Variational Inference For Topic Models. ICLR 2017
`CombinedTM`
`DecTM`	Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.
`ETM`	Topic Modeling in Embedding Spaces. TACL 2020
`NSTM`	Neural Topic Model via Optimal Transport. ICLR 2021
`TSCTM`	Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022
`ECRTM`	Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023
`NMTM`	Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.
`InfoCTM`	InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023
`DETM`	The Dynamic Embedded Topic Model. 2019
`SawETM`	Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.
`HyperMiner`	HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.
`TraCo`	On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

Autoencoding Variational Inference For Topic Models. ICLR 2017

Akash Srivastava, Charles Sutton.

get_beta()#

get_theta(x)#

reparameterize(mu, logvar)#

encode(x)#

decode(theta)#

forward(x)#

loss_function(x, recon_x, mu, logvar)#

class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

get_beta()#

get_theta(x)#

reparameterize(mu, logvar)#

encode(x)#

decode(theta)#

forward(x)#

loss_function(x, recon_x, mu, logvar)#

class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

Xiaobao Wu, Chunping Li, Yishu Miao.

get_beta()#

get_theta(x)#

reparameterize(mu, logvar)#

encode(x)#

decode(theta)#

forward(x)#

loss_function(x, recon_x, mu, logvar)#

class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)#

Bases: torch.nn.Module

Topic Modeling in Embedding Spaces. TACL 2020

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.

reparameterize(mu, logvar)#

encode(x)#

get_theta(x)#

get_beta()#

forward(x, avg_loss=True)#

loss_function(x, recon_x, mu, logvar, avg_loss=True)#

class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)#

Bases: torch.nn.Module

Neural Topic Model via Optimal Transport. ICLR 2021

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.

get_beta()#

get_theta(input)#

forward(input)#

class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)#

Bases: torch.nn.Module

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.

Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.

get_beta()#

encode(inputs)#

decode(theta)#

get_theta(inputs)#

forward(inputs)#

loss_function(recon_x, x)#

class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#

Bases: torch.nn.Module

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.

get_beta()#

reparameterize(mu, logvar)#

encode(input)#

get_theta(input)#

compute_loss_KL(mu, logvar)#

get_loss_ECR()#

pairwise_euclidean_distance(x, y)#

forward(input)#

class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)#

Bases: torch.nn.Module

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.

reparameterize(mu, logvar)#

encode(x, lang)#

get_theta(x, lang)#

get_beta()#

decode(theta, lang)#

forward(x_en, x_cn)#

loss_function(recon_x, x, mu, logvar)#

class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)#

Bases: torch.nn.Module

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu

get_beta()#

get_theta(x, lang)#

decode(theta, beta, lang)#

forward(x_en, x_cn)#

compute_loss_TM(recon_x, x, mu, logvar)#

class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')#

Bases: torch.nn.Module

The Dynamic Embedded Topic Model. 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

property word_embeddings#

property topic_embeddings#

get_activation(act)#

reparameterize(mu, logvar)#: Returns a sample from a Gaussian distribution via reparameterization.

get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)#: Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).

get_alpha()#

get_eta(rnn_inp)#

get_theta(bows, times, eta=None)#: Returns the topic proportions.

get_beta(alpha=None)#: Returns the topic matrix eta of shape T x K x V

get_NLL(theta, beta, bows)#

forward(bows, times)#

init_hidden()#: Initializes the first hidden state of the RNN used as inference network for eta.

class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)#

Bases: torch.nn.Module

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.

https://github.com/ZhibinDuan/SawETM

property bottom_word_embeddings#

property topic_embeddings_list#

log_max(x)#

reparameterize(shape, scale, sample_num=50)#: Returns a sample from a Weibull distribution via reparameterization.

kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)#: Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.

get_nll(x, x_reconstruct)#: Returns the negative Poisson likelihood of observational count data.

get_phis()#: Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()#

get_phi_list()#

get_theta(x)#

forward(x)#: Forward pass: compute the kl loss and data likelihood.

class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)#

Bases: topmost.models.hierarchical.SawETM.SawETM.SawETM

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.

https://github.com/NoviceStone/HyperMiner

property bottom_word_embeddings#

property topic_embeddings_list#

feat_clip(x)#

get_phi()#: Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()#

get_phi_list()#

get_theta(x)#

forward(x)#: Forward pass: compute the kl loss and data likelihood.

class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#

Bases: torch.nn.Module

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.

get_beta()#

get_phi_list()#

get_theta(input_bow)#

forward(input_bow)#

compute_loss_KL(mu, logvar, mu_prior=None)#