models#

Package Contents#

ProdLDA

Autoencoding Variational Inference For Topic Models. ICLR 2017

CombinedTM

DecTM

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

ETM

Topic Modeling in Embedding Spaces. TACL 2020

NSTM

Neural Topic Model via Optimal Transport. ICLR 2021

TSCTM

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

ECRTM

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

NMTM

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

InfoCTM

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

DETM

The Dynamic Embedded Topic Model. 2019

SawETM

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

HyperMiner

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

TraCo

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

Autoencoding Variational Inference For Topic Models. ICLR 2017

Akash Srivastava, Charles Sutton.

get_beta()#
get_theta(x)#
reparameterize(mu, logvar)#
encode(x)#
decode(theta)#
forward(x)#
loss_function(x, recon_x, mu, logvar)#
class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

get_beta()#
get_theta(x)#
reparameterize(mu, logvar)#
encode(x)#
decode(theta)#
forward(x)#
loss_function(x, recon_x, mu, logvar)#
class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)#

Bases: torch.nn.Module

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

Xiaobao Wu, Chunping Li, Yishu Miao.

get_beta()#
get_theta(x)#
reparameterize(mu, logvar)#
encode(x)#
decode(theta)#
forward(x)#
loss_function(x, recon_x, mu, logvar)#
class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)#

Bases: torch.nn.Module

Topic Modeling in Embedding Spaces. TACL 2020

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.

reparameterize(mu, logvar)#
encode(x)#
get_theta(x)#
get_beta()#
forward(x, avg_loss=True)#
loss_function(x, recon_x, mu, logvar, avg_loss=True)#
class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)#

Bases: torch.nn.Module

Neural Topic Model via Optimal Transport. ICLR 2021

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.

get_beta()#
get_theta(input)#
forward(input)#
class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)#

Bases: torch.nn.Module

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.

Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.

get_beta()#
encode(inputs)#
decode(theta)#
get_theta(inputs)#
forward(inputs)#
loss_function(recon_x, x)#
class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#

Bases: torch.nn.Module

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.

get_beta()#
reparameterize(mu, logvar)#
encode(input)#
get_theta(input)#
compute_loss_KL(mu, logvar)#
get_loss_ECR()#
pairwise_euclidean_distance(x, y)#
forward(input)#
class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)#

Bases: torch.nn.Module

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.

reparameterize(mu, logvar)#
encode(x, lang)#
get_theta(x, lang)#
get_beta()#
decode(theta, lang)#
forward(x_en, x_cn)#
loss_function(recon_x, x, mu, logvar)#
class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)#

Bases: torch.nn.Module

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu

get_beta()#
get_theta(x, lang)#
decode(theta, beta, lang)#
forward(x_en, x_cn)#
compute_loss_TM(recon_x, x, mu, logvar)#
class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')#

Bases: torch.nn.Module

The Dynamic Embedded Topic Model. 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

property word_embeddings#
property topic_embeddings#
get_activation(act)#
reparameterize(mu, logvar)#

Returns a sample from a Gaussian distribution via reparameterization.

get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)#

Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).

get_alpha()#
get_eta(rnn_inp)#
get_theta(bows, times, eta=None)#

Returns the topic proportions.

get_beta(alpha=None)#

Returns the topic matrix eta of shape T x K x V

get_NLL(theta, beta, bows)#
forward(bows, times)#
init_hidden()#

Initializes the first hidden state of the RNN used as inference network for eta.

class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)#

Bases: torch.nn.Module

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.

https://github.com/ZhibinDuan/SawETM

property bottom_word_embeddings#
property topic_embeddings_list#
log_max(x)#
reparameterize(shape, scale, sample_num=50)#

Returns a sample from a Weibull distribution via reparameterization.

kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)#

Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.

get_nll(x, x_reconstruct)#

Returns the negative Poisson likelihood of observational count data.

get_phis()#

Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()#
get_phi_list()#
get_theta(x)#
forward(x)#

Forward pass: compute the kl loss and data likelihood.

class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)#

Bases: topmost.models.hierarchical.SawETM.SawETM.SawETM

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.

https://github.com/NoviceStone/HyperMiner

property bottom_word_embeddings#
property topic_embeddings_list#
feat_clip(x)#
get_phi()#

Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()#
get_phi_list()#
get_theta(x)#
forward(x)#

Forward pass: compute the kl loss and data likelihood.

class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)#

Bases: torch.nn.Module

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.

get_beta()#
get_phi_list()#
get_theta(input_bow)#
forward(input_bow)#
compute_loss_KL(mu, logvar, mu_prior=None)#