All Projects → microsoft → Browsecloud

microsoft / Browsecloud

Licence: mit
A web app to create and browse text visualizations for automated customer listening.

Programming Languages

typescript
32286 projects

Projects that are alternatives of or similar to Browsecloud

Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-68.53%)
Mutual labels:  text-classification, text-processing
Fastnlp
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+1606.99%)
Mutual labels:  text-classification, text-processing
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (+16.78%)
Mutual labels:  text-classification, text-processing
Applied Text Mining In Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Stars: ✭ 59 (-58.74%)
Mutual labels:  text-classification, text-processing
Text Mining
Text Mining in Python
Stars: ✭ 18 (-87.41%)
Mutual labels:  text-classification, text-processing
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-0.7%)
Mutual labels:  text-classification, text-processing
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-13.29%)
Mutual labels:  text-classification, text-processing
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+143.36%)
Mutual labels:  text-classification, text-processing
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-83.92%)
Mutual labels:  text-classification, text-processing
Text classification
Text Classification Algorithms: A Survey
Stars: ✭ 1,276 (+792.31%)
Mutual labels:  text-classification, text-processing
Rcnn Text Classification
Tensorflow Implementation of "Recurrent Convolutional Neural Network for Text Classification" (AAAI 2015)
Stars: ✭ 127 (-11.19%)
Mutual labels:  text-classification
Ml Projects
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (-11.19%)
Mutual labels:  text-classification
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (-5.59%)
Mutual labels:  text-processing
Stanza Old
Stanford NLP group's shared Python tools.
Stars: ✭ 142 (-0.7%)
Mutual labels:  text-processing
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1376.92%)
Mutual labels:  text-classification
Hierarchical Multi Label Text Classification
The code of CIKM'19 paper《Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach》
Stars: ✭ 133 (-6.99%)
Mutual labels:  text-classification
Padatious
A neural network intent parser
Stars: ✭ 124 (-13.29%)
Mutual labels:  text-processing
Python Stop Words
Get list of common stop words in various languages in Python
Stars: ✭ 122 (-14.69%)
Mutual labels:  text-classification
Monkeylearn Python
Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.
Stars: ✭ 143 (+0%)
Mutual labels:  text-classification
Parselawdocuments
对收集的法律文档进行一系列分析,包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等(试验目前基于婚姻类案件,可扩展至其它领域)。
Stars: ✭ 138 (-3.5%)
Mutual labels:  text-classification

BrowseCloud - Public Demo

Try out BrowseCloud with a demonstration model trained on the English dictionary here.

BrowseCloud - Microsoft Internal

If you're a Microsoft full-time employee, try out our full site.

It supports creating custom visualizations with your own data set and correlate metadata with topics. This site also has a Gallery of models and visualizations with data such as the Microsoft employee engagement survey, called MSPoll, and feedback on the Windows Engineering System.

BrowseCloud Build Status

alt text

It's a laborious task to collect and synthesize the perspectives of customers. There's an immense amount of customer data from a variety of digital channels: survey data, StackOverflow, Reddit, email, etc. Even for internal tools teams at Microsoft, there are at least 10,000 user feedback documents generated per quarter.

To help solve this problem, BrowseCloud is an application that summarizes feedback data via smart word clouds, called counting grids. On a word cloud, the size of the text simply scales with the frequency of the word. Text is scattered randomly on word clouds. In BrowseCloud, we have a word cloud where the position of the word matters. As the user scans along the visualization, themes smoothly transition between each other.

Introduction to BrowseCloud

BrowseCloud Tutorial

Features

  • Add your custom text data set to the site. *
  • Visualize the text data by inspecting the largest words in clusters around the screen.
  • Drop a pin by clicking on the visualization to view a ranked list of verbatims (shown on the far right-hand side of the screen) related to the micro-topic you pinned!
  • Search for a word to narrow down the visualization and ranked list further.
  • Correlate topics with positive or negative sentiment on the screen by looking at the color of the the words in a region, after applying the sentiment analysis job. *
  • Correlate your own custom metadata with topic. We support numeric data, nominal data with two categories, and ordinal data. *
  • Download the relevant verbatims into Excel!

* These features are not supported in the demo application. They are in the full version.

Getting Started

Our documentation is available on this repository's wiki.

Build and Test

We have Azure Pipelines set up on the pull request workflow for pre-check-in validation. The pipeline will also deploy the demo site on merge with master.

Note that it is not required that you use the service to get up and running with the app. You can quickly visualize your data by using the Python command line application to train your data, and copying the resulting model files to the /browsecloud-client/src/assets/demo folder. You can then run the demo client app by following the client setup steps and running npm run start:demo.

Client

The client is a simple Angular CLI generated application.

At this point the client should load in your browser for local development. You will need to adjust some of the values in src/environments/environment.ts in order to login with your AAD app and point the app to the correct service URL. For more information on how to create an AAD app, visit the azure docs.

If instead you want to build to host on your own webserver, you can run npm run build or npm run build:prod. You can then host these files in a simple Azure App Service or elsewhere.

There are currently no tests, but we would love it if someone would contribute some 😉

Service

The service is an ASP.NET Core application that has many Azure dependencies. We will first get these dependencies set up.

  • Visit the Azure Portal and create a new resource of type "Template Deployment". On the next page, select "Build your own template in the editor", and upload the template file /deployment/az-service-template.json. On the next page, fill in the resource and resource group names. Purchase this resource group.
  • Create an AAD app for the service. For more information on how to create an AAD app, visit the azure docs.
  • Perform some setup tasks on these resources.
    • Visit the newly created Azure KeyVault, and add yourself to access the secrets in the "Access policies" pane.
    • In the KeyVault's "Secrets" pane, you will find some secret names have been generated. Populate these secrets with a secret for your AAD app, your Document DB secret, and your Redis connection string (all generated by the template file). After setting up the Azure Batch infrastructure for training the models, you can populate the rest of the secrets.
    • On the newly created Cosmos Document DB account, create two new containers named "BatchJob" and "Document".
  • Download and install Visual Studio 2019 with the "ASP.NET and web development" workload.
  • In /BrowseCloud.Service/BrowseCloud.Service/appsettings.json, configure your development environment using the information from the services you just created.
  • You can then build and run using Visual Studio's built in build and run feature.

This can be built and deployed to the Azure App Service generated in the steps above for everyday use. The easiest method is to right click on the BrowseCloud.Service project and "Publish", but we should recommend a CI/CD pipeline of some type. We have our Azure DevOps build pipelines checked in as yaml files which you are welcomed to use.

There are currently no tests on the Service, but we welcome contribution on this front.

Trainer Jobs

This is the machine learning backend that powers BrowseCloud. It has many Azure dependencies.

  • Visit the Azure Portal and create a new resource of type "Template Deployment". On the next page, select "Build your own template in the editor", and upload the template file /deployment/az-ml-backend-template.json. On the next page, fill in the resource and resource group names. Purchase this resource group.

Next, we will setup our VM. The work to setup dependencies on a machine in the cloud like this is automatable, but it hasn't been done.

  • Visit the Azure Portal and choose to create a new resource of type "Windows Server 2016 Datacenter". In this initial setup, make sure you have RDP enabled to setup the VM.

  • RDP into the non-production VM and follow the setup instructions to get the CountingGridsPy library running on the VM. In your production instance of the VM, we recommend that you have RDP turned off.

  • Save your VM as an image within the new virtual machine resource on the Azure Portal. This will destabilize the VM, so you should delete the VM.

  • Next, we'll take a look at the Batch resource you generated from the template. The purpose of Batch is to manage and scale computational power with the machine learning work to do.

Create two jobs and two pools within this Batch resource, one for your dev environment and another for your production environment. You can do this by using the Azure portal or by using \Batch\Batch\src\deployBrowseCloudBatchPool.py. In our design, jobs are permenant, and each training request is a task underneath each job.

We recommend that you scale the number of VMs elastically with the number of tasks running on your queue, so work can be done in parallel. You can even have multiple tasks running on the same machine using Batch. Lastly, recommend that you always have one Windows VM running and ready to go due to in the autoScale Formula.

An example scaling configuration could be:

"scaleSettings": {
    "autoScale": {
        "formula": "maxNumberofVMs = 5;sample =$PendingTasks.GetSample(10);pendingTaskSamplePercent = avg(sample);startingNumberOfVMs = 1; pendingTaskSamples = pendingTaskSamplePercent < 2 ? startingNumberOfVMs : avg($PendingTasks.GetSample(180 * TimeInterval_Second));$TargetDedicatedNodes=min(maxNumberofVMs, pendingTaskSamples);",
        "evaluationInterval": "PT5M"
    }
}

We also recommend that you use a more powerful VM in your production instance than in your development instance. We use "vmSize" of "STANDARD_D16_V3" on our production site for training new models. We use a "vmSize" of "STANDARD_A1" in our development instance.

  • In /Batch/Batch/src/metadata.json and /Batch/Batch/src/keys.json (which are not checked into this repo), configure your development environment using the information from the services you just created.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

Feedback

Your pull request will now go through extensive checks by the subject matter experts on our team. Please be patient; we have hundreds of pull requests across all of our repositories. Update your pull request according to feedback until it is approved by one of the team members.

Code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Privacy Notice

There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

Reporting Security Issues

Security issues and bugs should be reported privately, via email, to the Microsoft Security Response Center (MSRC) at [email protected]. You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Further information, including the MSRC PGP key, can be found in the Security TechCenter.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].