All Projects → senderle → Topic Modeling Tool

senderle / Topic Modeling Tool

Licence: apache-2.0
A point-and-click tool for creating and analyzing topic models produced by MALLET.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Topic Modeling Tool

Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+7.06%)
Mutual labels:  data-science, topic-modeling
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+14915.29%)
Mutual labels:  data-science, topic-modeling
Guidedlda
semi supervised guided topic model with custom guidedLDA
Stars: ✭ 390 (+358.82%)
Mutual labels:  data-science, topic-modeling
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-7.06%)
Mutual labels:  data-science
Pydepta
A python implementation of DEPTA
Stars: ✭ 79 (-7.06%)
Mutual labels:  data-science
Dltk
Deep Learning Toolkit for Medical Image Analysis
Stars: ✭ 1,249 (+1369.41%)
Mutual labels:  data-science
Pymrmr
Python3 binding to mRMR Feature Selection algorithm (currently not maintained)
Stars: ✭ 85 (+0%)
Mutual labels:  data-science
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+1329.41%)
Mutual labels:  data-science
Maze
Maze Applied Reinforcement Learning Framework
Stars: ✭ 85 (+0%)
Mutual labels:  data-science
Databench
Data analysis tool.
Stars: ✭ 82 (-3.53%)
Mutual labels:  data-science
Malwaredatascience
Malware Data Science Reading Diary / Notes
Stars: ✭ 82 (-3.53%)
Mutual labels:  data-science
Learn machine learning
Road to Machine Learning
Stars: ✭ 81 (-4.71%)
Mutual labels:  data-science
Conferences
List of Machine Learning & Data Science Conferences
Stars: ✭ 83 (-2.35%)
Mutual labels:  data-science
Phormatics
Using A.I. and computer vision to build a virtual personal fitness trainer. (Most Startup-Viable Hack - HackNYU2018)
Stars: ✭ 79 (-7.06%)
Mutual labels:  data-science
Jupytemplate
Templates for jupyter notebooks
Stars: ✭ 85 (+0%)
Mutual labels:  data-science
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-7.06%)
Mutual labels:  data-science
Sortingalgorithm.hayateshiki
Hayate-Shiki is an improved merge sort algorithm with the goal of "faster than quick sort".
Stars: ✭ 84 (-1.18%)
Mutual labels:  data-science
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-4.71%)
Mutual labels:  data-science
Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+1356.47%)
Mutual labels:  data-science
Flyte
Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+1361.18%)
Mutual labels:  data-science

DOI

Topic Modeling Tool

An updated GUI for MALLET's implementation of LDA.*

New features:

  • Metadata integration
  • Automatic file segmentation
  • Custom CSV delimiters
  • Alpha/Beta optimization
  • Custom regex tokenization
  • Multicore processor support

Getting Started:

To start using some of these new features right away, consult the quickstart guide. For tinkerers, there's a guide to the tool's optional settings. You may also find useful information in the discussion threads under documentation issues.

Requirements:

The Topic Modeling Tool now has native Windows and Mac apps, and because of unicode issues, these are currently the best options for installation. Just follow the instructions for your operating system. Do not try to install by clicking on [Clone or download] > [Download ZIP]. It won't work.

For Macs:

  • Download TopicModelingTool.dmg.
  • Open it by double-clicking.
  • Drag the app into your Applications folder -- or into any folder at all.
  • Run the app by double-clicking.

For Windows PCs:

  • Download TopicModelingTool.zip.
    • NOTE: The native PC build is out-of-date. Help wanted.
  • Extract the files into any folder and open it.
  • Double-click on the file called TopicModelingTool.exe to run it.

If you want to run the plain .jar file, you'll need to have a fairly recent version of Java; the version that came with your computer may not work, especially if your computer is a Mac. Whatever your operating system, you can install an updated version of Java by following the instructions for your operating system here.

Windows Unicode Support:

Windows and Java don't play very well together when it comes to unicode text. If you are using the .jar build, and non-ascii characters are getting garbled on a Windows machine, there's a quick fix involving environment variables that may make things work.

Again, the best answer may just be to use the native app. It should now work correctly at every stage with UTF-8-encoded text. (If it doesn't, let us know and we will moan and gnash our teeth some more.)

Reporting and Replicating Bugs and Other Issues:

If you hadn't already guessed, most testing for this tool happens on a Mac. There are bound to be errors happening on other platforms that have slipped through the cracks. We need you to report them so we can keep improving the tool! But we cannot fix a problem that we don't fully understand, so...

When posting a bug report, please include vast amounts of detail.

Copy and paste everything from the tool's console output if you can, tell us your operating system and version, and let us know the other tools you're using to create and view input and output. It also helps if you verify that the bug still exists in the most recent build of the tool (i.e. the one contained in the .jar, .dmg, or .zip files in the root directory).

We know that there are substantial problems with Windows support for unicode text; if you see problems, please post detailed information under the main issue so that we can start isolating and fixing these bugs.

We love getting new issues because it means the tool is improving! But again, when posting a bug report, please include vast amounts of detail.

Building the Development Version:

If you feel adventurous, you might want to modify the code and compile your own version. To do so, you'll need to install Apache Maven as well as the Java Development Kit. On Macs, Homebrew is the best way to do so; simply install homebrew as described on the Homebrew site, and then type brew install maven at the command line. On Windows PCs -- you're on your own! But we did it and it wasn't terribly hard. You just need an up-to-date JDK and maven package, with their bin folders in your PATH.

With maven installed, simply use the terminal to navigate to the TopicModelingTool folder:

$ cd topic-modeling-tool/TopicModelingTool

Then use maven's package command:

$ mvn package

We now have experimental support for compiling the tool as a native app using the javafx plugin for maven. This will build a native package able to run on your operating system. This has been tested on both Macs and Windows PCs.

$ mvn jfx:native

Acknowledgements:

This version of the tool was forked from the original version by David Newman and Arun Balagopalan.

Previous work on the GUI for MALLET has been supported by a National Leadership Grant (LG-06-08-0057-08) from the Institute of Museum and Library Services to Yale University, the University of Michigan, and the University of California, Irvine. The Institute of Museum and Library Services is the primary source of federal support for the nation's 123,000 libraries and 17,500 museums. The Institute's mission is to create strong libraries and museums that connect people to information and ideas.

Work on this version of the tool has benefited from the support of Penn Libraries and the the University of Pennsylvania's Price Lab for Digital Humanities.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].