models

Package Contents

ProdLDA

Autoencoding Variational Inference For Topic Models. ICLR 2017

CombinedTM

DecTM

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

ETM

Topic Modeling in Embedding Spaces. TACL 2020

NSTM

Neural Topic Model via Optimal Transport. ICLR 2021

TSCTM

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

ECRTM

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

NMTM

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

InfoCTM

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

DETM

The Dynamic Embedded Topic Model. 2019

CFDTM

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings

SawETM

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

HyperMiner

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

TraCo

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)

Bases: torch.nn.Module

Autoencoding Variational Inference For Topic Models. ICLR 2017

Akash Srivastava, Charles Sutton.

num_topics = 50
a
mu2
var2
fc11
fc12
fc21
fc22
mean_bn
logvar_bn
decoder_bn
fc1_drop
theta_drop
fcd1
get_beta()
get_theta(x)
reparameterize(mu, logvar)
encode(x)
decode(theta)
forward(x)
loss_function(x, recon_x, mu, logvar)
class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)

Bases: torch.nn.Module

vocab_size
num_topics = 50
a
mu2
var2
fc_contextual
fc11
fc12
fc21
fc22
mean_bn
logvar_bn
decoder_bn
fc1_drop
theta_drop
fcd1
get_beta()
get_theta(x)
reparameterize(mu, logvar)
encode(x)
decode(theta)
forward(x)
loss_function(x, recon_x, mu, logvar)
class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)

Bases: torch.nn.Module

Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.

Xiaobao Wu, Chunping Li, Yishu Miao.

num_topics = 50
a
mu2
var2
fc11
fc12
fc21
fc22
mean_bn
logvar_bn
decoder_bn
fc1_drop
theta_drop
beta
get_beta()
get_theta(x)
reparameterize(mu, logvar)
encode(x)
decode(theta)
forward(x)
loss_function(x, recon_x, mu, logvar)
class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)

Bases: torch.nn.Module

Topic Modeling in Embedding Spaces. TACL 2020

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.

topic_embeddings
encoder1
fc21
fc22
reparameterize(mu, logvar)
encode(x)
get_theta(x)
get_beta()
forward(x, avg_loss=True)
loss_function(x, recon_x, mu, logvar, avg_loss=True)
class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)

Bases: torch.nn.Module

Neural Topic Model via Optimal Transport. ICLR 2021

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.

recon_loss_weight = 0.07
sinkhorn_alpha = 20
e1
e2
e_dropout
mean_bn
topic_embeddings
get_beta()
get_theta(input)
forward(input)
class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)

Bases: torch.nn.Module

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022

Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.

Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.

fc11
fc12
fc21
mean_bn
decoder_bn
fcd1
topic_dist_quant
contrast_loss
get_beta()
encode(inputs)
decode(theta)
get_theta(inputs)
forward(inputs)
loss_function(recon_x, x)
class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)

Bases: torch.nn.Module

Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023

Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.

num_topics = 50
beta_temp = 0.2
a
mu2
var2
fc11
fc12
fc21
fc22
fc1_dropout
theta_dropout
mean_bn
logvar_bn
decoder_bn
word_embeddings
topic_embeddings
ECR
get_beta()
reparameterize(mu, logvar)
encode(input)
get_theta(input)
compute_loss_KL(mu, logvar)
get_loss_ECR()
pairwise_euclidean_distance(x, y)
forward(input)
class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)

Bases: torch.nn.Module

Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020.

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.

num_topics = 50
lam = 0.8
Map_en2cn
Map_cn2en
a
mu2
var2
decoder_bn_en
decoder_bn_cn
fc11_en
fc11_cn
fc12
fc21
fc22
fc1_drop
z_drop
mean_bn
logvar_bn
phi_en
phi_cn
reparameterize(mu, logvar)
encode(x, lang)
get_theta(x, lang)
get_beta()
decode(theta, lang)
forward(x_en, x_cn)
loss_function(recon_x, x, mu, logvar)
class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)

Bases: torch.nn.Module

InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023

Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu

num_topics = 50
encoder_en
encoder_cn
a
mu2
var2
decoder_bn_en
decoder_bn_cn
phi_en
phi_cn
TAMI
get_beta()
get_theta(x, lang)
decode(theta, beta, lang)
forward(x_en, x_cn)
compute_loss_TM(recon_x, x, mu, logvar)
class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')

Bases: torch.nn.Module

The Dynamic Embedded Topic Model. 2019

Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei

num_topics = 50
num_times
vocab_size
eta_hidden_size = 200
rho_size = 300
enc_drop = 0.0
eta_nlayers = 3
t_drop
eta_dropout = 0.0
delta = 0.005
train_WE = True
train_size
rnn_inp
device = 'cpu'
theta_act = 'relu'
mu_q_alpha
logsigma_q_alpha
q_theta
mu_q_theta
logsigma_q_theta
q_eta_map
q_eta
mu_q_eta
logsigma_q_eta
decoder_bn
get_activation(act)
reparameterize(mu, logvar)

Returns a sample from a Gaussian distribution via reparameterization.

get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)

Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).

get_alpha()
get_eta(rnn_inp)
get_theta(bows, times, eta=None)

Returns the topic proportions.

property word_embeddings
property topic_embeddings
get_beta(alpha=None)

Returns the topic matrix eta of shape T x K x V

get_NLL(theta, beta, bows)
forward(bows, times)
init_hidden()

Initializes the first hidden state of the RNN used as inference network for eta.

class CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200)

Bases: torch.nn.Module

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings

Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu.

num_topics = 50
beta_temp = 1.0
train_time_wordfreq
encoder
a
mu2
var2
decoder_bn
topic_embeddings
ETC
UWE
get_beta()
pairwise_euclidean_dist(x, y)
get_theta(x, times=None)
get_KL(mu, logvar)
get_NLL(theta, beta, x, recon_x=None)
decode(theta, beta)
forward(x, times)
class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)

Bases: torch.nn.Module

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.

Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.

https://github.com/ZhibinDuan/SawETM

device = 'cpu'
gam_prior
real_min
theta_max
wei_shape_min
wei_shape_max
num_topics_list
num_hiddens_list
num_layers
alpha
h_encoder
q_theta
log_max(x)
reparameterize(shape, scale, sample_num=50)

Returns a sample from a Weibull distribution via reparameterization.

kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)

Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.

get_nll(x, x_reconstruct)

Returns the negative Poisson likelihood of observational count data.

property bottom_word_embeddings
property topic_embeddings_list
get_phis()

Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()
get_phi_list()
get_theta(x)
forward(x)

Forward pass: compute the kl loss and data likelihood.

class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)

Bases: topmost.models.hierarchical.SawETM.SawETM.SawETM

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.

Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.

https://github.com/NoviceStone/HyperMiner

manifold
clip_r = None
feat_clip(x)
property bottom_word_embeddings
property topic_embeddings_list
get_phi()

Returns the factor loading matrix by utilizing sawtooth connection.

get_beta()
get_phi_list()
get_theta(x)
forward(x)

Forward pass: compute the kl loss and data likelihood.

class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)

Bases: torch.nn.Module

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024

Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.

num_topics_list = [10, 50, 200]
weight_loss_TPD = 20.0
beta_temp = 0.1
num_layers
bottom_word_embeddings
topic_embeddings_list
TPD
CDDecoder
encoder
get_beta()
get_phi_list()
get_theta(input_bow)
forward(input_bow)
compute_loss_KL(mu, logvar, mu_prior=None)