topmost ======= .. py:module:: topmost .. toctree:: :titlesonly: :maxdepth: 3 data/index.rst eva/index.rst models/index.rst preprocess/index.rst trainers/index.rst utils/index.rst Package Contents ---------------- .. autoapisummary:: topmost.Preprocess topmost.BasicDataset topmost.RawDataset topmost.CrosslingualDataset topmost.DynamicDataset topmost.BasicTrainer topmost.BERTopicTrainer topmost.FASTopicTrainer topmost.LDAGensimTrainer topmost.LDASklearnTrainer topmost.NMFGensimTrainer topmost.NMFSklearnTrainer topmost.CrosslingualTrainer topmost.DynamicTrainer topmost.DTMTrainer topmost.HierarchicalTrainer topmost.HDPGensimTrainer topmost.ProdLDA topmost.CombinedTM topmost.DecTM topmost.ETM topmost.NSTM topmost.TSCTM topmost.ECRTM topmost.NMTM topmost.InfoCTM topmost.DETM topmost.CFDTM topmost.SawETM topmost.HyperMiner topmost.TraCo .. autoapisummary:: topmost.download_dataset .. py:class:: Preprocess(tokenizer=None, test_sample_size=None, test_p=0.2, stopwords='English', min_doc_count=0, max_doc_freq=1.0, keep_num=False, keep_alphanum=False, strip_html=False, no_lower=False, min_length=3, min_term=0, vocab_size=None, seed=42, verbose=True) :param test_sample_size: Size of the test set. :param test_p: Proportion of the test set. This helps sample the train set based on the size of the test set. :param stopwords: List of stopwords to exclude. :param min-doc-count: Exclude words that occur in less than this number of documents. :param max_doc_freq: Exclude words that occur in more than this proportion of documents. :param keep-num: Keep tokens made of only numbers. :param keep-alphanum: Keep tokens made of a mixture of letters and numbers. :param strip_html: Strip HTML tags. :param no-lower: Do not lowercase text :param min_length: Minimum token length. :param min_term: Minimum term number :param vocab-size: Size of the vocabulary (by most common in the union of train and test sets, following above exclusions) :param seed: Random integer seed (only relevant for choosing test set) .. py:attribute:: test_sample_size :value: None .. py:attribute:: min_doc_count :value: 0 .. py:attribute:: max_doc_freq :value: 1.0 .. py:attribute:: min_term :value: 0 .. py:attribute:: test_p :value: 0.2 .. py:attribute:: vocab_size :value: None .. py:attribute:: seed :value: 42 .. py:method:: parse(texts, vocab) .. py:method:: preprocess_jsonlist(dataset_dir, label_name=None, pretrained_WE=False) .. py:method:: convert_labels(train_labels, test_labels) .. py:method:: preprocess(raw_train_texts, train_labels=None, raw_test_texts=None, test_labels=None, pretrained_WE=False) .. py:method:: save(output_dir, vocab, train_texts, train_bow, word_embeddings=None, train_labels=None, test_texts=None, test_bow=None, test_labels=None) .. py:class:: BasicDataset(dataset_dir, batch_size=200, read_labels=False, as_tensor=True, contextual_embed=False, doc_embed_model='all-MiniLM-L6-v2', device='cpu') .. py:attribute:: vocab_size :value: 0 .. py:method:: load_data(path, read_labels) .. py:class:: RawDataset(docs, preprocess=None, batch_size=200, device='cpu', as_tensor=True, contextual_embed=False, pretrained_WE=False, doc_embed_model='all-MiniLM-L6-v2', embed_model_device=None, verbose=False) .. py:attribute:: train_data .. py:attribute:: train_texts .. py:attribute:: vocab .. py:attribute:: vocab_size .. py:class:: CrosslingualDataset(dataset_dir, lang1, lang2, dict_path, device='cpu', batch_size=200, as_tensor=True) .. py:attribute:: batch_size :value: 200 .. py:attribute:: train_size_en :value: 0 .. py:attribute:: train_size_cn :value: 0 .. py:attribute:: vocab_size_en :value: 0 .. py:attribute:: vocab_size_cn :value: 0 .. py:attribute:: pretrained_WE_en .. py:attribute:: pretrained_WE_cn .. py:attribute:: Map_en2cn .. py:attribute:: Map_cn2en .. py:method:: move_to_device(bow, device) .. py:method:: read_data(dataset_dir, lang) .. py:method:: parse_dictionary(dict_path) .. py:method:: get_Map(trans_matrix, bow) .. py:class:: DynamicDataset(dataset_dir, batch_size=200, read_labels=False, device='cpu', as_tensor=True) .. py:attribute:: vocab_size :value: 0 .. py:attribute:: train_size .. py:attribute:: num_times .. py:attribute:: train_time_wordfreq .. py:method:: load_data(path, read_labels) .. py:method:: get_time_wordfreq(bow, times) .. py:function:: download_dataset(dataset_name, cache_path='~/.topmost') .. py:class:: BasicTrainer(model, dataset, num_top_words=15, epochs=200, learning_rate=0.002, batch_size=200, lr_scheduler=None, lr_step_size=125, log_interval=5, verbose=False) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: epochs :value: 200 .. py:attribute:: learning_rate :value: 0.002 .. py:attribute:: batch_size :value: 200 .. py:attribute:: lr_scheduler :value: None .. py:attribute:: lr_step_size :value: 125 .. py:attribute:: log_interval :value: 5 .. py:attribute:: verbose :value: False .. py:method:: make_optimizer() .. py:method:: make_lr_scheduler(optimizer) .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: BERTopicTrainer(dataset, num_topics=50, num_top_words=15) .. py:attribute:: model .. py:attribute:: dataset .. py:method:: train() .. py:method:: test(texts) .. py:method:: get_beta() .. py:method:: get_top_words() .. py:method:: export_theta() .. py:class:: FASTopicTrainer(dataset, num_topics=50, num_top_words=15, preprocess=None, epochs=200, DT_alpha=3.0, TW_alpha=2.0, theta_temp=1.0, verbose=False) .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: model .. py:attribute:: epochs :value: 200 .. py:method:: train() .. py:method:: test(texts) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: LDAGensimTrainer(dataset, num_topics=50, num_top_words=15, max_iter=1, alpha='symmetric', eta=None, verbose=False) .. py:attribute:: dataset .. py:attribute:: num_topics :value: 50 .. py:attribute:: vocab_size .. py:attribute:: max_iter :value: 1 .. py:attribute:: alpha :value: 'symmetric' .. py:attribute:: eta :value: None .. py:attribute:: verbose :value: False .. py:attribute:: num_top_words :value: 15 .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: LDASklearnTrainer(model, dataset, num_top_words=15, verbose=False) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: verbose :value: False .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: NMFGensimTrainer(dataset, num_topics=50, num_top_words=15, max_iter=1) .. py:attribute:: dataset .. py:attribute:: num_topics :value: 50 .. py:attribute:: num_top_words :value: 15 .. py:attribute:: vocab_size .. py:attribute:: max_iter :value: 1 .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: NMFSklearnTrainer(model, dataset, num_top_words=15) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: CrosslingualTrainer(model, dataset, num_top_words=15, epochs=500, learning_rate=0.002, batch_size=200, lr_scheduler=None, lr_step_size=125, log_interval=5, verbose=False) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: epochs :value: 500 .. py:attribute:: learning_rate :value: 0.002 .. py:attribute:: batch_size :value: 200 .. py:attribute:: lr_scheduler :value: None .. py:attribute:: lr_step_size :value: 125 .. py:attribute:: log_interval :value: 5 .. py:method:: make_optimizer() .. py:method:: make_lr_scheduler(optimizer) .. py:method:: train() .. py:method:: test(bow_en, bow_cn) .. py:method:: infer_theta(bow, lang) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: DynamicTrainer(model, dataset, num_top_words=15, epochs=200, learning_rate=0.002, batch_size=200, lr_scheduler=None, lr_step_size=125, log_interval=5, verbose=False) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: epochs :value: 200 .. py:attribute:: learning_rate :value: 0.002 .. py:attribute:: batch_size :value: 200 .. py:attribute:: lr_scheduler :value: None .. py:attribute:: lr_step_size :value: 125 .. py:attribute:: log_interval :value: 5 .. py:attribute:: verbose :value: False .. py:method:: make_optimizer() .. py:method:: make_lr_scheduler(optimizer) .. py:method:: train() .. py:method:: test(bow, times) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: DTMTrainer(dataset, num_topics=50, num_top_words=15, alphas=0.01, chain_variance=0.005, passes=10, lda_inference_max_iter=25, em_min_iter=6, em_max_iter=20, verbose=False) .. py:attribute:: dataset .. py:attribute:: vocab_size .. py:attribute:: num_topics :value: 50 .. py:attribute:: num_top_words :value: 15 .. py:attribute:: alphas :value: 0.01 .. py:attribute:: chain_variance :value: 0.005 .. py:attribute:: passes :value: 10 .. py:attribute:: lda_inference_max_iter :value: 25 .. py:attribute:: em_min_iter :value: 6 .. py:attribute:: em_max_iter :value: 20 .. py:attribute:: verbose :value: False .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_theta() .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: HierarchicalTrainer(model, dataset, num_top_words=15, epochs=200, learning_rate=0.002, batch_size=200, lr_scheduler=None, lr_step_size=125, log_interval=5, verbose=False) .. py:attribute:: model .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: epochs :value: 200 .. py:attribute:: learning_rate :value: 0.002 .. py:attribute:: batch_size :value: 200 .. py:attribute:: lr_scheduler :value: None .. py:attribute:: lr_step_size :value: 125 .. py:attribute:: log_interval :value: 5 .. py:attribute:: verbose :value: False .. py:method:: make_optimizer() .. py:method:: make_lr_scheduler(optimizer) .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_phi() .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None, annotation=False) .. py:method:: export_theta() .. py:class:: HDPGensimTrainer(dataset, num_top_words=15, max_chunks=None, max_time=None, chunksize=256, kappa=1.0, tau=64.0, K=15, T=150, alpha=1, gamma=1, eta=0.01, scale=1.0, var_converge=0.0001, verbose=False) .. py:attribute:: dataset .. py:attribute:: num_top_words :value: 15 .. py:attribute:: vocab_size .. py:attribute:: max_chunks :value: None .. py:attribute:: max_time :value: None .. py:attribute:: chunksize :value: 256 .. py:attribute:: kappa :value: 1.0 .. py:attribute:: tau :value: 64.0 .. py:attribute:: K :value: 15 .. py:attribute:: T :value: 150 .. py:attribute:: alpha :value: 1 .. py:attribute:: gamma :value: 1 .. py:attribute:: eta :value: 0.01 .. py:attribute:: scale :value: 1.0 .. py:attribute:: var_converge :value: 0.0001 .. py:attribute:: verbose :value: False .. py:method:: train() .. py:method:: test(bow) .. py:method:: get_beta() .. py:method:: get_top_words(num_top_words=None) .. py:method:: export_theta() .. py:class:: ProdLDA(vocab_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` Autoencoding Variational Inference For Topic Models. ICLR 2017 Akash Srivastava, Charles Sutton. .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: fcd1 .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: CombinedTM(vocab_size, contextual_embed_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` .. py:attribute:: vocab_size .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc_contextual .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: fcd1 .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: DecTM(vocab_size, num_topics=50, en_units=200, dropout=0.4) Bases: :py:obj:`torch.nn.Module` Discovering Topics in Long-tailed Corpora with Causal Intervention. ACL 2021 findings. Xiaobao Wu, Chunping Li, Yishu Miao. .. py:attribute:: num_topics :value: 50 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: fc1_drop .. py:attribute:: theta_drop .. py:attribute:: beta .. py:method:: get_beta() .. py:method:: get_theta(x) .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: decode(theta) .. py:method:: forward(x) .. py:method:: loss_function(x, recon_x, mu, logvar) .. py:class:: ETM(vocab_size, embed_size=200, num_topics=50, en_units=800, dropout=0.0, pretrained_WE=None, train_WE=False) Bases: :py:obj:`torch.nn.Module` Topic Modeling in Embedding Spaces. TACL 2020 Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei. .. py:attribute:: topic_embeddings .. py:attribute:: encoder1 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x) .. py:method:: get_theta(x) .. py:method:: get_beta() .. py:method:: forward(x, avg_loss=True) .. py:method:: loss_function(x, recon_x, mu, logvar, avg_loss=True) .. py:class:: NSTM(vocab_size, num_topics=50, en_units=200, dropout=0.25, pretrained_WE=None, train_WE=True, embed_size=200, recon_loss_weight=0.07, sinkhorn_alpha=20) Bases: :py:obj:`torch.nn.Module` Neural Topic Model via Optimal Transport. ICLR 2021 He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine. .. py:attribute:: recon_loss_weight :value: 0.07 .. py:attribute:: sinkhorn_alpha :value: 20 .. py:attribute:: e1 .. py:attribute:: e2 .. py:attribute:: e_dropout .. py:attribute:: mean_bn .. py:attribute:: topic_embeddings .. py:method:: get_beta() .. py:method:: get_theta(input) .. py:method:: forward(input) .. py:class:: TSCTM(vocab_size, num_topics=50, en_units=200, temperature=0.5, weight_contrast=1.0) Bases: :py:obj:`torch.nn.Module` Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. EMNLP 2022 Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong. Note: This implementation does not include TSCTM with augmentations. For augmentations, see https://github.com/BobXWu/TSCTM. .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: mean_bn .. py:attribute:: decoder_bn .. py:attribute:: fcd1 .. py:attribute:: topic_dist_quant .. py:attribute:: contrast_loss .. py:method:: get_beta() .. py:method:: encode(inputs) .. py:method:: decode(theta) .. py:method:: get_theta(inputs) .. py:method:: forward(inputs) .. py:method:: loss_function(recon_x, x) .. py:class:: ECRTM(vocab_size, num_topics=50, en_units=200, dropout=0.0, pretrained_WE=None, embed_size=200, beta_temp=0.2, weight_loss_ECR=100.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000) Bases: :py:obj:`torch.nn.Module` Effective Neural Topic Modeling with Embedding Clustering Regularization. ICML 2023 Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics :value: 50 .. py:attribute:: beta_temp :value: 0.2 .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: fc11 .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: fc1_dropout .. py:attribute:: theta_dropout .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: decoder_bn .. py:attribute:: word_embeddings .. py:attribute:: topic_embeddings .. py:attribute:: ECR .. py:method:: get_beta() .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(input) .. py:method:: get_theta(input) .. py:method:: compute_loss_KL(mu, logvar) .. py:method:: get_loss_ECR() .. py:method:: pairwise_euclidean_distance(x, y) .. py:method:: forward(input) .. py:class:: NMTM(Map_en2cn, Map_cn2en, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, lam=0.8) Bases: :py:obj:`torch.nn.Module` Learning Multilingual Topics with Neural Variational Inference. NLPCC 2020. Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao. .. py:attribute:: num_topics :value: 50 .. py:attribute:: lam :value: 0.8 .. py:attribute:: Map_en2cn .. py:attribute:: Map_cn2en .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn_en .. py:attribute:: decoder_bn_cn .. py:attribute:: fc11_en .. py:attribute:: fc11_cn .. py:attribute:: fc12 .. py:attribute:: fc21 .. py:attribute:: fc22 .. py:attribute:: fc1_drop .. py:attribute:: z_drop .. py:attribute:: mean_bn .. py:attribute:: logvar_bn .. py:attribute:: phi_en .. py:attribute:: phi_cn .. py:method:: reparameterize(mu, logvar) .. py:method:: encode(x, lang) .. py:method:: get_theta(x, lang) .. py:method:: get_beta() .. py:method:: decode(theta, lang) .. py:method:: forward(x_en, x_cn) .. py:method:: loss_function(recon_x, x, mu, logvar) .. py:class:: InfoCTM(trans_e2c, pretrain_word_embeddings_en, pretrain_word_embeddings_cn, vocab_size_en, vocab_size_cn, num_topics=50, en_units=200, dropout=0.0, temperature=0.2, pos_threshold=0.4, weight_MI=30.0) Bases: :py:obj:`torch.nn.Module` InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling. AAAI 2023 Xiaobao Wu, Xinshuai Dong, Thong Nguyen, Chaoqun Liu, Liangming Pan, Anh Tuan Luu .. py:attribute:: num_topics :value: 50 .. py:attribute:: encoder_en .. py:attribute:: encoder_cn .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn_en .. py:attribute:: decoder_bn_cn .. py:attribute:: phi_en .. py:attribute:: phi_cn .. py:attribute:: TAMI .. py:method:: get_beta() .. py:method:: get_theta(x, lang) .. py:method:: decode(theta, beta, lang) .. py:method:: forward(x_en, x_cn) .. py:method:: compute_loss_TM(recon_x, x, mu, logvar) .. py:class:: DETM(vocab_size, num_times, train_size, train_time_wordfreq, num_topics=50, train_WE=True, pretrained_WE=None, en_units=800, eta_hidden_size=200, rho_size=300, enc_drop=0.0, eta_nlayers=3, eta_dropout=0.0, delta=0.005, theta_act='relu', device='cpu') Bases: :py:obj:`torch.nn.Module` The Dynamic Embedded Topic Model. 2019 Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei .. py:attribute:: num_topics :value: 50 .. py:attribute:: num_times .. py:attribute:: vocab_size .. py:attribute:: eta_hidden_size :value: 200 .. py:attribute:: rho_size :value: 300 .. py:attribute:: enc_drop :value: 0.0 .. py:attribute:: eta_nlayers :value: 3 .. py:attribute:: t_drop .. py:attribute:: eta_dropout :value: 0.0 .. py:attribute:: delta :value: 0.005 .. py:attribute:: train_WE :value: True .. py:attribute:: train_size .. py:attribute:: rnn_inp .. py:attribute:: device :value: 'cpu' .. py:attribute:: theta_act :value: 'relu' .. py:attribute:: mu_q_alpha .. py:attribute:: logsigma_q_alpha .. py:attribute:: q_theta .. py:attribute:: mu_q_theta .. py:attribute:: logsigma_q_theta .. py:attribute:: q_eta_map .. py:attribute:: q_eta .. py:attribute:: mu_q_eta .. py:attribute:: logsigma_q_eta .. py:attribute:: decoder_bn .. py:method:: get_activation(act) .. py:method:: reparameterize(mu, logvar) Returns a sample from a Gaussian distribution via reparameterization. .. py:method:: get_kl(q_mu, q_logsigma, p_mu=None, p_logsigma=None) Returns KL( N(q_mu, q_logsigma) || N(p_mu, p_logsigma) ). .. py:method:: get_alpha() .. py:method:: get_eta(rnn_inp) .. py:method:: get_theta(bows, times, eta=None) Returns the topic proportions. .. py:property:: word_embeddings .. py:property:: topic_embeddings .. py:method:: get_beta(alpha=None) Returns the topic matrix eta of shape T x K x V .. py:method:: get_NLL(theta, beta, bows) .. py:method:: forward(bows, times) .. py:method:: init_hidden() Initializes the first hidden state of the RNN used as inference network for \eta. .. py:class:: CFDTM(vocab_size, train_time_wordfreq, num_times, pretrained_WE=None, num_topics=50, en_units=100, temperature=0.1, beta_temp=1.0, weight_neg=10000000.0, weight_pos=10.0, weight_UWE=1000.0, neg_topk=15, dropout=0.0, embed_size=200) Bases: :py:obj:`torch.nn.Module` Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion. ACL 2024 Findings Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics :value: 50 .. py:attribute:: beta_temp :value: 1.0 .. py:attribute:: train_time_wordfreq .. py:attribute:: encoder .. py:attribute:: a .. py:attribute:: mu2 .. py:attribute:: var2 .. py:attribute:: decoder_bn .. py:attribute:: topic_embeddings .. py:attribute:: ETC .. py:attribute:: UWE .. py:method:: get_beta() .. py:method:: pairwise_euclidean_dist(x, y) .. py:method:: get_theta(x, times=None) .. py:method:: get_KL(mu, logvar) .. py:method:: get_NLL(theta, beta, x, recon_x=None) .. py:method:: decode(theta, beta) .. py:method:: forward(x, times) .. py:class:: SawETM(vocab_size, num_topics_list, device='cpu', embed_size=100, hidden_size=256, pretrained_WE=None) Bases: :py:obj:`torch.nn.Module` Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network. ICML 2021. Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou. https://github.com/ZhibinDuan/SawETM .. py:attribute:: device :value: 'cpu' .. py:attribute:: gam_prior .. py:attribute:: real_min .. py:attribute:: theta_max .. py:attribute:: wei_shape_min .. py:attribute:: wei_shape_max .. py:attribute:: num_topics_list .. py:attribute:: num_hiddens_list .. py:attribute:: num_layers .. py:attribute:: alpha .. py:attribute:: h_encoder .. py:attribute:: q_theta .. py:method:: log_max(x) .. py:method:: reparameterize(shape, scale, sample_num=50) Returns a sample from a Weibull distribution via reparameterization. .. py:method:: kl_weibull_gamma(wei_shape, wei_scale, gam_shape, gam_scale) Returns the Kullback-Leibler divergence between a Weibull distribution and a Gamma distribution. .. py:method:: get_nll(x, x_reconstruct) Returns the negative Poisson likelihood of observational count data. .. py:property:: bottom_word_embeddings .. py:property:: topic_embeddings_list .. py:method:: get_phis() Returns the factor loading matrix by utilizing sawtooth connection. .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(x) .. py:method:: forward(x) Forward pass: compute the kl loss and data likelihood. .. py:class:: HyperMiner(vocab_size, num_topics_list, device='cpu', manifold='PoincareBall', clip_r=None, curvature=-0.01, embed_size=50, hidden_size=300, pretrained_WE=None) Bases: :py:obj:`topmost.models.hierarchical.SawETM.SawETM.SawETM` HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding. NeurIPS 2022. Yishi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou. https://github.com/NoviceStone/HyperMiner .. py:attribute:: manifold .. py:attribute:: clip_r :value: None .. py:method:: feat_clip(x) .. py:property:: bottom_word_embeddings .. py:property:: topic_embeddings_list .. py:method:: get_phi() Returns the factor loading matrix by utilizing sawtooth connection. .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(x) .. py:method:: forward(x) Forward pass: compute the kl loss and data likelihood. .. py:class:: TraCo(vocab_size, num_topics_list=[10, 50, 200], en_units=300, dropout=0.0, embed_size=200, bias_topk=20, bias_p=5.0, beta_temp=0.1, weight_loss_TPD=20.0, sinkhorn_alpha=20.0, sinkhorn_max_iter=1000) Bases: :py:obj:`torch.nn.Module` On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling. AAAI 2024 Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu. .. py:attribute:: num_topics_list :value: [10, 50, 200] .. py:attribute:: weight_loss_TPD :value: 20.0 .. py:attribute:: beta_temp :value: 0.1 .. py:attribute:: num_layers .. py:attribute:: bottom_word_embeddings .. py:attribute:: topic_embeddings_list .. py:attribute:: TPD .. py:attribute:: CDDecoder .. py:attribute:: encoder .. py:method:: get_beta() .. py:method:: get_phi_list() .. py:method:: get_theta(input_bow) .. py:method:: forward(input_bow) .. py:method:: compute_loss_KL(mu, logvar, mu_prior=None)