models¶

basic
- ECRTM
  - ECR
  - ECRTM
- NSTM
  - NSTM
  - auto_diff_sinkhorn
- TSCTM
  - TSC
  - TSCTM
  - TopicDistQuant
- CombinedTM
- DecTM
- ETM
- ProdLDA
crosslingual
- InfoCTM
  - InfoCTM
  - TAMI
- NMTM
dynamic
- CFDTM
  - CFDTM
  - ETC
  - UWE
- DETM
hierarchical
- HyperMiner
  - manifolds
  - HyperMiner
- SawETM
  - SawETM
  - block
- TraCo
  - CDDecoder
  - TPD
  - TraCo
  - utils
- ProGBN

Encoder

Package Contents¶

`ProdLDA`	Autoencoding Variational Inference For Topic Models. ICLR 2017
`CombinedTM`
`DecTM`	Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.
`ETM`	Topic Modeling in Embedding Spaces. TACL 2020
`NSTM`	Neural Topic Model via Optimal Transport. ICLR 2021
`TSCTM`	Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022
`ECRTM`	Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023
`NMTM`	Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.
`InfoCTM`	InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023
`DETM`	The Dynamic Embedded Topic Model. 2019
`CFDTM`	Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings
`SawETM`	Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.
`HyperMiner`	HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.
`TraCo`	On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)¶

Bases: torch.nn.Module

Autoencoding Variational Inference For Topic Models. ICLR 2017

Akash Srivastava, Charles Sutton.

num_topics = 50¶

a¶

mu2¶

var2¶

fc11¶

fc12¶

fc21¶

fc22¶

mean_bn¶

logvar_bn¶

decoder_bn¶

fc1_drop¶

theta_drop¶

fcd1¶

get_beta()¶

get_theta(x)¶

reparameterize(mu, logvar)¶

encode(x)¶

decode(theta)¶

forward(x)¶

loss_function(x, recon_x, mu, logvar)¶

class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)¶

Bases: torch.nn.Module

vocab_size¶

num_topics = 50¶

a¶

mu2¶

var2¶

fc_contextual¶

fc11¶

fc12¶

fc21¶

fc22¶

mean_bn¶

logvar_bn¶

decoder_bn¶

fc1_drop¶

theta_drop¶

fcd1¶

get_beta()¶

get_theta(x)¶

reparameterize(mu, logvar)¶

encode(x)¶

decode(theta)¶

forward(x)¶

loss_function(x, recon_x, mu, logvar)¶

class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)¶

Bases: torch.nn.Module

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

Xiaobao Wu, Chunping Li, Yishu Miao.

num_topics = 50¶

a¶

mu2¶

var2¶

fc11¶

fc12¶

fc21¶

fc22¶

mean_bn¶

logvar_bn¶

decoder_bn¶

fc1_drop¶

theta_drop¶

beta¶

get_beta()¶

get_theta(x)¶

reparameterize(mu, logvar)¶

encode(x)¶

decode(theta)¶

forward(x)¶

loss_function(x, recon_x, mu, logvar)¶

class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)¶

Bases: torch.nn.Module

Topic Modeling in Embedding Spaces. TACL 2020

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.

topic_embeddings¶

encoder1¶

fc21¶

fc22¶

reparameterize(mu, logvar)¶

encode(x)¶

get_theta(x)¶

get_beta()¶

forward(x, avg_loss=True)¶

loss_function(x, recon_x, mu, logvar, avg_loss=True)¶

class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)¶

Bases: torch.nn.Module

Neural Topic Model via Optimal Transport. ICLR 2021

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.

recon_loss_weight = 0.07¶

sinkhorn_alpha = 20¶

e1¶

e2¶

e_dropout¶

mean_bn¶

topic_embeddings¶

get_beta()¶

get_theta(input)¶

forward(input)¶

class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)¶

Bases: torch.nn.Module

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.

Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.

fc11¶

fc12¶

fc21¶

mean_bn¶

decoder_bn¶

fcd1¶

topic_dist_quant¶

contrast_loss¶

get_beta()¶

encode(inputs)¶

decode(theta)¶

get_theta(inputs)¶

forward(inputs)¶

loss_function(recon_x, x)¶

class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)¶

Bases: torch.nn.Module

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.

num_topics = 50¶

beta_temp = 0.2¶

a¶

mu2¶

var2¶

fc11¶

fc12¶

fc21¶

fc22¶

fc1_dropout¶

theta_dropout¶

mean_bn¶

logvar_bn¶

decoder_bn¶

word_embeddings¶

topic_embeddings¶

ECR¶

get_beta()¶

reparameterize(mu, logvar)¶

encode(input)¶

get_theta(input)¶

compute_loss_KL(mu, logvar)¶

get_loss_ECR()¶

pairwise_euclidean_distance(x, y)¶

forward(input)¶

class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)¶

Bases: torch.nn.Module

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.

num_topics = 50¶

lam = 0.8¶

Map_en2cn¶

Map_cn2en¶

a¶

mu2¶

var2¶

decoder_bn_en¶

decoder_bn_cn¶

fc11_en¶

fc11_cn¶

fc12¶

fc21¶

fc22¶

fc1_drop¶

z_drop¶

mean_bn¶

logvar_bn¶

phi_en¶

phi_cn¶

reparameterize(mu, logvar)¶

encode(x, lang)¶

get_theta(x, lang)¶

get_beta()¶

decode(theta, lang)¶

forward(x_en, x_cn)¶

loss_function(recon_x, x, mu, logvar)¶

class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)¶

Bases: torch.nn.Module

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu

num_topics = 50¶

encoder_en¶

encoder_cn¶

a¶

mu2¶

var2¶

decoder_bn_en¶

decoder_bn_cn¶

phi_en¶

phi_cn¶

TAMI¶

get_beta()¶

get_theta(x, lang)¶

decode(theta, beta, lang)¶

forward(x_en, x_cn)¶

compute_loss_TM(recon_x, x, mu, logvar)¶

class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')¶

Bases: torch.nn.Module

The Dynamic Embedded Topic Model. 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

num_topics = 50¶

num_times¶

vocab_size¶

eta_hidden_size = 200¶

rho_size = 300¶

enc_drop = 0.0¶

eta_nlayers = 3¶

t_drop¶

eta_dropout = 0.0¶

delta = 0.005¶

train_WE = True¶

train_size¶

rnn_inp¶

device = 'cpu'¶

theta_act = 'relu'¶

mu_q_alpha¶

logsigma_q_alpha¶

q_theta¶

mu_q_theta¶

logsigma_q_theta¶

q_eta_map¶

q_eta¶

mu_q_eta¶

logsigma_q_eta¶

decoder_bn¶

get_activation(act)¶

reparameterize(mu, logvar)¶: Returns a sample from a Gaussian distribution via reparameterization.

get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)¶: Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).

get_alpha()¶

get_eta(rnn_inp)¶

get_theta(bows, times, eta=None)¶: Returns the topic proportions.

property word_embeddings¶

property topic_embeddings¶

get_beta(alpha=None)¶: Returns the topic matrix eta of shape T x K x V

get_NLL(theta, beta, bows)¶

forward(bows, times)¶

init_hidden()¶: Initializes the first hidden state of the RNN used as inference network for eta.

class CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200)¶

Bases: torch.nn.Module

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings

Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu.

num_topics = 50¶

beta_temp = 1.0¶

train_time_wordfreq¶

encoder¶

a¶

mu2¶

var2¶

decoder_bn¶

topic_embeddings¶

ETC¶

UWE¶

get_beta()¶

pairwise_euclidean_dist(x, y)¶

get_theta(x, times=None)¶

get_KL(mu, logvar)¶

get_NLL(theta, beta, x, recon_x=None)¶

decode(theta, beta)¶

forward(x, times)¶

class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)¶

Bases: torch.nn.Module

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.

https://github.com/ZhibinDuan/SawETM

device = 'cpu'¶

gam_prior¶

real_min¶

theta_max¶

wei_shape_min¶

wei_shape_max¶

num_topics_list¶

num_hiddens_list¶

num_layers¶

alpha¶

h_encoder¶

q_theta¶

log_max(x)¶

reparameterize(shape, scale, sample_num=50)¶: Returns a sample from a Weibull distribution via reparameterization.

kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)¶: Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.

get_nll(x, x_reconstruct)¶: Returns the negative Poisson likelihood of observational count data.

property bottom_word_embeddings¶

property topic_embeddings_list¶

get_phis()¶: Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()¶

get_phi_list()¶

get_theta(x)¶

forward(x)¶: Forward pass: compute the kl loss and data likelihood.

class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)¶

Bases: topmost.models.hierarchical.SawETM.SawETM.SawETM

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.

https://github.com/NoviceStone/HyperMiner

manifold¶

clip_r = None¶

feat_clip(x)¶

property bottom_word_embeddings¶

property topic_embeddings_list¶

get_phi()¶: Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()¶

get_phi_list()¶

get_theta(x)¶

forward(x)¶: Forward pass: compute the kl loss and data likelihood.

class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)¶

Bases: torch.nn.Module

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.

num_topics_list = [10, 50, 200]¶

weight_loss_TPD = 20.0¶

beta_temp = 0.1¶

num_layers¶

bottom_word_embeddings¶

topic_embeddings_list¶

TPD¶

CDDecoder¶

encoder¶

get_beta()¶

get_phi_list()¶

get_theta(input_bow)¶

forward(input_bow)¶

compute_loss_KL(mu, logvar, mu_prior=None)¶