models#
Package Contents#
Autoencoding Variational Inference For Topic Models. ICLR 2017 |
|
Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings. |
|
Topic Modeling in Embedding Spaces. TACL 2020 |
|
Neural Topic Model via Optimal Transport. ICLR 2021 |
|
Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022 |
|
Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023 |
|
Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020. |
|
InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023 |
|
The Dynamic Embedded Topic Model. 2019 |
|
Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021. |
|
HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022. |
|
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024 |
- class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)#
Bases:
torch.nn.ModuleAutoencoding Variational Inference For Topic Models. ICLR 2017
Akash Srivastava, Charles Sutton.
- get_beta()#
- get_theta(x)#
- reparameterize(mu, logvar)#
- encode(x)#
- decode(theta)#
- forward(x)#
- loss_function(x, recon_x, mu, logvar)#
- class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)#
Bases:
torch.nn.Module- get_beta()#
- get_theta(x)#
- reparameterize(mu, logvar)#
- encode(x)#
- decode(theta)#
- forward(x)#
- loss_function(x, recon_x, mu, logvar)#
- class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)#
Bases:
torch.nn.ModuleDiscovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.
Xiaobao Wu, Chunping Li, Yishu Miao.
- get_beta()#
- get_theta(x)#
- reparameterize(mu, logvar)#
- encode(x)#
- decode(theta)#
- forward(x)#
- loss_function(x, recon_x, mu, logvar)#
- class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)#
Bases:
torch.nn.ModuleTopic Modeling in Embedding Spaces. TACL 2020
Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.
- reparameterize(mu, logvar)#
- encode(x)#
- get_theta(x)#
- get_beta()#
- forward(x, avg_loss=True)#
- loss_function(x, recon_x, mu, logvar, avg_loss=True)#
- class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)#
Bases:
torch.nn.ModuleNeural Topic Model via Optimal Transport. ICLR 2021
He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.
- get_beta()#
- get_theta(input)#
- forward(input)#
- class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)#
Bases:
torch.nn.ModuleMitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022
Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.
Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.
- get_beta()#
- encode(inputs)#
- decode(theta)#
- get_theta(inputs)#
- forward(inputs)#
- loss_function(recon_x, x)#
- class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#
Bases:
torch.nn.ModuleEffective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023
Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.
- get_beta()#
- reparameterize(mu, logvar)#
- encode(input)#
- get_theta(input)#
- compute_loss_KL(mu, logvar)#
- get_loss_ECR()#
- pairwise_euclidean_distance(x, y)#
- forward(input)#
- class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)#
Bases:
torch.nn.ModuleLearning Multilingual Topics with Neural Variational Inference. NLPCC 2020.
Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.
- reparameterize(mu, logvar)#
- encode(x, lang)#
- get_theta(x, lang)#
- get_beta()#
- decode(theta, lang)#
- forward(x_en, x_cn)#
- loss_function(recon_x, x, mu, logvar)#
- class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)#
Bases:
torch.nn.ModuleInfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023
Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu
- get_beta()#
- get_theta(x, lang)#
- decode(theta, beta, lang)#
- forward(x_en, x_cn)#
- compute_loss_TM(recon_x, x, mu, logvar)#
- class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')#
Bases:
torch.nn.ModuleThe Dynamic Embedded Topic Model. 2019
Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei
- property word_embeddings#
- property topic_embeddings#
- get_activation(act)#
- reparameterize(mu, logvar)#
Returns a sample from a Gaussian distribution via reparameterization.
- get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)#
Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).
- get_alpha()#
- get_eta(rnn_inp)#
- get_theta(bows, times, eta=None)#
Returns the topic proportions.
- get_beta(alpha=None)#
Returns the topic matrix eta of shape T x K x V
- get_NLL(theta, beta, bows)#
- forward(bows, times)#
Initializes the first hidden state of the RNN used as inference network for eta.
- class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)#
Bases:
torch.nn.ModuleSawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.
Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.
https://github.com/ZhibinDuan/SawETM
- property bottom_word_embeddings#
- property topic_embeddings_list#
- log_max(x)#
- reparameterize(shape, scale, sample_num=50)#
Returns a sample from a Weibull distribution via reparameterization.
- kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)#
Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.
- get_nll(x, x_reconstruct)#
Returns the negative Poisson likelihood of observational count data.
- get_phis()#
Returns the factor loading matrix by utilizing sawtooth connection.
- get_beta()#
- get_phi_list()#
- get_theta(x)#
- forward(x)#
Forward pass: compute the kl loss and data likelihood.
- class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)#
Bases:
topmost.models.hierarchical.SawETM.SawETM.SawETMHyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.
Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.
https://github.com/NoviceStone/HyperMiner
- property bottom_word_embeddings#
- property topic_embeddings_list#
- feat_clip(x)#
- get_phi()#
Returns the factor loading matrix by utilizing sawtooth connection.
- get_beta()#
- get_phi_list()#
- get_theta(x)#
- forward(x)#
Forward pass: compute the kl loss and data likelihood.
- class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#
Bases:
torch.nn.ModuleOn the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024
Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.
- get_beta()#
- get_phi_list()#
- get_theta(input_bow)#
- forward(input_bow)#
- compute_loss_KL(mu, logvar, mu_prior=None)#