models¶
Package Contents¶
Autoencoding Variational Inference For Topic Models. ICLR 2017 |
|
Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings. |
|
Topic Modeling in Embedding Spaces. TACL 2020 |
|
Neural Topic Model via Optimal Transport. ICLR 2021 |
|
Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022 |
|
Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023 |
|
Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020. |
|
InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023 |
|
The Dynamic Embedded Topic Model. 2019 |
|
Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings |
|
Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021. |
|
HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022. |
|
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024 |
- class ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4)¶
Bases:
torch.nn.ModuleAutoencoding Variational Inference For Topic Models. ICLR 2017
Akash Srivastava, Charles Sutton.
- num_topics = 50¶
- a¶
- mu2¶
- var2¶
- fc11¶
- fc12¶
- fc21¶
- fc22¶
- mean_bn¶
- logvar_bn¶
- decoder_bn¶
- fc1_drop¶
- theta_drop¶
- fcd1¶
- get_beta()¶
- get_theta(x)¶
- reparameterize(mu, logvar)¶
- encode(x)¶
- decode(theta)¶
- forward(x)¶
- loss_function(x, recon_x, mu, logvar)¶
- class CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4)¶
Bases:
torch.nn.Module- vocab_size¶
- num_topics = 50¶
- a¶
- mu2¶
- var2¶
- fc_contextual¶
- fc11¶
- fc12¶
- fc21¶
- fc22¶
- mean_bn¶
- logvar_bn¶
- decoder_bn¶
- fc1_drop¶
- theta_drop¶
- fcd1¶
- get_beta()¶
- get_theta(x)¶
- reparameterize(mu, logvar)¶
- encode(x)¶
- decode(theta)¶
- forward(x)¶
- loss_function(x, recon_x, mu, logvar)¶
- class DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4)¶
Bases:
torch.nn.ModuleDiscovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings.
Xiaobao Wu, Chunping Li, Yishu Miao.
- num_topics = 50¶
- a¶
- mu2¶
- var2¶
- fc11¶
- fc12¶
- fc21¶
- fc22¶
- mean_bn¶
- logvar_bn¶
- decoder_bn¶
- fc1_drop¶
- theta_drop¶
- beta¶
- get_beta()¶
- get_theta(x)¶
- reparameterize(mu, logvar)¶
- encode(x)¶
- decode(theta)¶
- forward(x)¶
- loss_function(x, recon_x, mu, logvar)¶
- class ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False)¶
Bases:
torch.nn.ModuleTopic Modeling in Embedding Spaces. TACL 2020
Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei.
- topic_embeddings¶
- encoder1¶
- fc21¶
- fc22¶
- reparameterize(mu, logvar)¶
- encode(x)¶
- get_theta(x)¶
- get_beta()¶
- forward(x, avg_loss=True)¶
- loss_function(x, recon_x, mu, logvar, avg_loss=True)¶
- class NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20)¶
Bases:
torch.nn.ModuleNeural Topic Model via Optimal Transport. ICLR 2021
He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine.
- recon_loss_weight = 0.07¶
- sinkhorn_alpha = 20¶
- e1¶
- e2¶
- e_dropout¶
- mean_bn¶
- topic_embeddings¶
- get_beta()¶
- get_theta(input)¶
- forward(input)¶
- class TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0)¶
Bases:
torch.nn.ModuleMitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022
Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong.
Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM.
- fc11¶
- fc12¶
- fc21¶
- mean_bn¶
- decoder_bn¶
- fcd1¶
- topic_dist_quant¶
- contrast_loss¶
- get_beta()¶
- encode(inputs)¶
- decode(theta)¶
- get_theta(inputs)¶
- forward(inputs)¶
- loss_function(recon_x, x)¶
- class ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)¶
Bases:
torch.nn.ModuleEffective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023
Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu.
- num_topics = 50¶
- beta_temp = 0.2¶
- a¶
- mu2¶
- var2¶
- fc11¶
- fc12¶
- fc21¶
- fc22¶
- fc1_dropout¶
- theta_dropout¶
- mean_bn¶
- logvar_bn¶
- decoder_bn¶
- word_embeddings¶
- topic_embeddings¶
- ECR¶
- get_beta()¶
- reparameterize(mu, logvar)¶
- encode(input)¶
- get_theta(input)¶
- compute_loss_KL(mu, logvar)¶
- get_loss_ECR()¶
- pairwise_euclidean_distance(x, y)¶
- forward(input)¶
- class NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8)¶
Bases:
torch.nn.ModuleLearning Multilingual Topics with Neural Variational Inference. NLPCC 2020.
Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao.
- num_topics = 50¶
- lam = 0.8¶
- Map_en2cn¶
- Map_cn2en¶
- a¶
- mu2¶
- var2¶
- decoder_bn_en¶
- decoder_bn_cn¶
- fc11_en¶
- fc11_cn¶
- fc12¶
- fc21¶
- fc22¶
- fc1_drop¶
- z_drop¶
- mean_bn¶
- logvar_bn¶
- phi_en¶
- phi_cn¶
- reparameterize(mu, logvar)¶
- encode(x, lang)¶
- get_theta(x, lang)¶
- get_beta()¶
- decode(theta, lang)¶
- forward(x_en, x_cn)¶
- loss_function(recon_x, x, mu, logvar)¶
- class InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0)¶
Bases:
torch.nn.ModuleInfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023
Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu
- num_topics = 50¶
- encoder_en¶
- encoder_cn¶
- a¶
- mu2¶
- var2¶
- decoder_bn_en¶
- decoder_bn_cn¶
- phi_en¶
- phi_cn¶
- TAMI¶
- get_beta()¶
- get_theta(x, lang)¶
- decode(theta, beta, lang)¶
- forward(x_en, x_cn)¶
- compute_loss_TM(recon_x, x, mu, logvar)¶
- class DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu')¶
Bases:
torch.nn.ModuleThe Dynamic Embedded Topic Model. 2019
Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei
- num_topics = 50¶
- num_times¶
- vocab_size¶
- rho_size = 300¶
- enc_drop = 0.0¶
- eta_nlayers = 3¶
- t_drop¶
- eta_dropout = 0.0¶
- delta = 0.005¶
- train_WE = True¶
- train_size¶
- rnn_inp¶
- device = 'cpu'¶
- theta_act = 'relu'¶
- mu_q_alpha¶
- logsigma_q_alpha¶
- q_theta¶
- mu_q_theta¶
- logsigma_q_theta¶
- q_eta_map¶
- q_eta¶
- mu_q_eta¶
- logsigma_q_eta¶
- decoder_bn¶
- get_activation(act)¶
- reparameterize(mu, logvar)¶
Returns a sample from a Gaussian distribution via reparameterization.
- get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None)¶
Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ).
- get_alpha()¶
- get_eta(rnn_inp)¶
- get_theta(bows, times, eta=None)¶
Returns the topic proportions.
- property word_embeddings¶
- property topic_embeddings¶
- get_beta(alpha=None)¶
Returns the topic matrix eta of shape T x K x V
- get_NLL(theta, beta, bows)¶
- forward(bows, times)¶
Initializes the first hidden state of the RNN used as inference network for eta.
- class CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200)¶
Bases:
torch.nn.ModuleModeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings
Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu.
- num_topics = 50¶
- beta_temp = 1.0¶
- train_time_wordfreq¶
- encoder¶
- a¶
- mu2¶
- var2¶
- decoder_bn¶
- topic_embeddings¶
- ETC¶
- UWE¶
- get_beta()¶
- pairwise_euclidean_dist(x, y)¶
- get_theta(x, times=None)¶
- get_KL(mu, logvar)¶
- get_NLL(theta, beta, x, recon_x=None)¶
- decode(theta, beta)¶
- forward(x, times)¶
- class SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None)¶
Bases:
torch.nn.ModuleSawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021.
Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou.
https://github.com/ZhibinDuan/SawETM
- device = 'cpu'¶
- gam_prior¶
- real_min¶
- theta_max¶
- wei_shape_min¶
- wei_shape_max¶
- num_topics_list¶
- num_layers¶
- alpha¶
- h_encoder¶
- q_theta¶
- log_max(x)¶
- reparameterize(shape, scale, sample_num=50)¶
Returns a sample from a Weibull distribution via reparameterization.
- kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale)¶
Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution.
- get_nll(x, x_reconstruct)¶
Returns the negative Poisson likelihood of observational count data.
- property bottom_word_embeddings¶
- property topic_embeddings_list¶
- get_phis()¶
Returns the factor loading matrix by utilizing sawtooth connection.
- get_beta()¶
- get_phi_list()¶
- get_theta(x)¶
- forward(x)¶
Forward pass: compute the kl loss and data likelihood.
- class HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None)¶
Bases:
topmost.models.hierarchical.SawETM.SawETM.SawETMHyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022.
Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou.
https://github.com/NoviceStone/HyperMiner
- manifold¶
- clip_r = None¶
- feat_clip(x)¶
- property bottom_word_embeddings¶
- property topic_embeddings_list¶
- get_phi()¶
Returns the factor loading matrix by utilizing sawtooth connection.
- get_beta()¶
- get_phi_list()¶
- get_theta(x)¶
- forward(x)¶
Forward pass: compute the kl loss and data likelihood.
- class TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000)¶
Bases:
torch.nn.ModuleOn the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024
Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu.
- num_topics_list = [10, 50, 200]¶
- weight_loss_TPD = 20.0¶
- beta_temp = 0.1¶
- num_layers¶
- bottom_word_embeddings¶
- topic_embeddings_list¶
- TPD¶
- CDDecoder¶
- encoder¶
- get_beta()¶
- get_phi_list()¶
- get_theta(input_bow)¶
- forward(input_bow)¶
- compute_loss_KL(mu, logvar, mu_prior=None)¶