All Projects → pravj → ospi

pravj / ospi

Licence: other
Open Source Presence Infographic of Indian Startups

Programming Languages

python
139335 projects - #7 most used programming language
r
7636 projects

Projects that are alternatives of or similar to ospi

Hiring
Create WOW Moments. Create superfans.
Stars: ✭ 85 (+240%)
Mutual labels:  startup, data-analysis
tutorials
Short programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-44%)
Mutual labels:  data-analysis
transbigdata
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Stars: ✭ 195 (+680%)
Mutual labels:  data-analysis
architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (+60%)
Mutual labels:  data-analysis
data-analysis
金融市场与体育彩券市场 --- 数据分析与量化交易
Stars: ✭ 73 (+192%)
Mutual labels:  data-analysis
ttbbeer
An R Dataset Package for US Beer Statistics From TTB 🍺
Stars: ✭ 23 (-8%)
Mutual labels:  data-analysis
hnn
The Human Neocortical Neurosolver (HNN) is a software tool that gives researchers/clinicians the ability to develop/test hypotheses on circuit mechanisms underlying EEG/MEG data.
Stars: ✭ 62 (+148%)
Mutual labels:  data-analysis
Chapter-2
Code examples for Chapter 2 of Data Wrangling with JavaScript
Stars: ✭ 16 (-36%)
Mutual labels:  data-analysis
Infinite Stories with Data
This repo consists of my analysis of random datasets using various statistical and visualization techniques.
Stars: ✭ 21 (-16%)
Mutual labels:  data-analysis
ipaddress
Data analysis of IP addresses and networks
Stars: ✭ 20 (-20%)
Mutual labels:  data-analysis
ipychart
The power of Chart.js with Python
Stars: ✭ 48 (+92%)
Mutual labels:  data-analysis
dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Stars: ✭ 25 (+0%)
Mutual labels:  data-analysis
Loan-Approval-Prediction
Loan Application Data Analysis
Stars: ✭ 61 (+144%)
Mutual labels:  data-analysis
antz
ANTz immersive 3D data visualization engine
Stars: ✭ 25 (+0%)
Mutual labels:  data-analysis
python ml tutorial
A complete tutorial in python for Data Analysis and Machine Learning
Stars: ✭ 118 (+372%)
Mutual labels:  data-analysis
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+544%)
Mutual labels:  data-analysis
startup-checklist
A checklist for incorporation so you can get back to building your product, fundraising, etc.
Stars: ✭ 2,321 (+9184%)
Mutual labels:  startup
IndexedTables.jl
Flexible tables with ordered indices
Stars: ✭ 108 (+332%)
Mutual labels:  data-analysis
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (+380%)
Mutual labels:  data-analysis
InOut4-landing
Landing page of InOut 4.0
Stars: ✭ 16 (-36%)
Mutual labels:  india

ospi

Open Source Presence Infographic of Indian Startups


Report

Technically speaking, organizations used in this report are no more only a startup now, but I hope you people won't mind this and aren't gonna launch a drone on me.

##Abstract

I think, something is clear from the name itself, is it? Well! it should.

This report tries to plot all the involved organizations on the Open-Source portal. It tries to tell that, in the race to achieve their goals, what different organizations are doing there, for/in the community.

It's pretty biased though, because this report uses only one platform of the Open-Source community, GitHub.

This report doesn't measure the success of involved organizations; it simply can't. They all are doing good in their fields, that's why they are here. tophat

##Motive

I think it was almost mid of the December last year when I saw the interview of Flipkart's CTO Amod Malviya in a YourStory article. I started reading that and kept reading till the end. At the end my reaction was, wow! this man is awesome and he is indeed. I have seen many of his talks after reading that interview.

That interview made a different impression on me. I liked his words where he was talking about building a top class internet infrastructure in India. I don't know what you people think of Flipkart, Myntra etc. but what I think is that they are evolving continuously, at least in the technical aspect. That's why they are in the marathon and Amazon itself is in the race with them.

So, after a while I found myself on the GitHub organization of Flipkart and I was scrolling through their projects there. Then the idea of this report popped-up in my mind and here I'm, struggling with it.

##For What Joy? Is there a need? The earth will keep rotating without this report but it's kinda necessary for technical organizations to be a part of current Open-Source era. I mean as they say in the Group Dynamics, If you're part of a group then you learn for other members and they learn from you.

Do you remember something named Facebook? Lets take an example from them.

Maybe that you take PHP as a language for the kids but keep in mind that The Social Network was initially developed in that same PHP. But as they started growing and feeling glitch using it; seeing that the santa was not coming to help them, they attempted building something on their own. Finally today, we know the inventions as HHVM and Hack language.

So, the thing is don't wait for santa and build cool things that matters. Big organizations are already doing it, be it hhvm, react by Facebook or typeahead.js by Twitter or web-starter-kit by Google and many more by others.

##Involved Organizations

I do believe that the organization selection part was a bit biased as I wanted to have my favorite organizations first on the list, like HackerEarth, Hasgeek, Housing, Flipkart, Wingify and Zomato etc.

It was disappointing to see that Housing was not on the GitHub by that time and Zomato's organization was having zero public activities.

Finally, I selected 15 startups, giving priority to my favorite ones.

  • Cucumbertown - Follow great cooks, showcase your cooking, build a following
  • Exotel - Reliable Cloud Telephony System for your business
  • Flipkart - Online Shopping India
  • Freshdesk - Online customer support software and helpdesk solution
  • HackerEarth - Programming challenges and Developer jobs
  • HasGeek - HasGeek organises events for geeks
  • Instamojo - Easiest Way to Collect Payments Online
  • Myntra - Online Shopping India
  • MySmartPrice - Compare the best prices from online retailers
  • Practo - Find Best Doctors and Book Appointments Online
  • ShepHertz - Complete Cloud Ecosystem for App/Game Developers
  • Urban Ladder - Furniture Online Shopping Store
  • WebEngage - On-Site Customer Engagement Suite
  • Wingify - Website Optimization tools that simply work
  • Zomato - Discover great places to eat around you

There is a section here in this report, which uses last year's GitHub activity of organizations, so I killed my idea of replacing Zomato by someone else as the year was gone and it was kinda tough to jump traditional API bumper and collect data.

As I said Zomato have zero public activity last year but it doesn't mean they are not good, they are doing pretty good; aquiring it all, at a rate of hurricane wind speed and serving in cities more than you've ever been in your life. Maybe they are using some other platform, a local Git hosting or something.


You better zoom-in the images or open them in a different tab.


##1. Appearance Timeline of Organizations

Do you know, when all of these organizations were found? Not sure?

Apperance-Timeline

This plot shows relative appearance of selected organizations both in the public world as well as in the open-source world.

Add legend text in the image.

  • I didn't know that Myntra was founded a bit earlier than Flipkart, who aquired the older player recently.
  • Myntra and Flipkart came in existance before the GitHub itself.
  • We can see a large gap between apperance on these two portals for Flipkart, Myntra and Zomato, Myntra being the slowest one to join.
  • Some organizations like Instamojo, HackerEarth and HasGeek felt the need of time and took no significant time in this.

Well! in case if you're thinking that this information is all chatter, let me present something interesting.

Go back and see the image carefully and you'll notice something different from others for Cucumbertown and HasGeek.

Yes! the GitHub organizations for these two were created before their public launching itself. Sounds interesting, right?

I can't say for Cucumbertown now but I can present a supporting theory to prove this for the HasGeek.

Do you guys remeber what was the first event that HasGeek organised? It was DocType HTML5, you silly. The event was held on October, 2010 and HasGeek was pubilcally launched in December, 2010. You can fly to their GitHub account and check that they are developing hasgeek/doctypehtml5 since then.

Maybe organising this event was the inspiration behind launching the HasGeek, I need to hear HasGeek founder Kiran's words on it, though.

##2. Repository Status As we all know, repository is an important component of GitHub's ecosystem.

###2.1. Public Repository Status

This section deals with no. of public repositories for each involved organization. Public-Repository-Count

Cloud services provider ShepHertz has maximum no. of public repositories there, mainly based on their App42 service stack. Flipkart and HasGeek also have significant no. of repositories, rest are the organizations are building their store gradually.

No. of repositories on GitHub is not the right thing to measure about, though.

###2.2. Stars Distribution

As I said, having more number of repositories doesn't explicitly show your popularity. It's not an old wars between states where king with more elephants was supposed to be the winner.

But no. of stars on any GitHub repository can represent its vogue, leave the case where they're fake. Stars-Distribution

This graph represents the stars distribution on all the repositories of involved organizations.

Top 10 repositories according to no. of stars

You can see Wingify, Flipkart and HasGeek are ruling the leader-board here.

###2.3. Relative Repository Attributes

GitHub provides a feature named fork, using that you can contribute to awesome projects of others like it was your own project.

This section deals with attributes of repositories, counting which one of them is a forked repository or which one is a source repository. Repository-Attributes

This plot shows which organization have all their own source repositories and which one is having forked repositories.

During the development, I also calculated active and inactive percentage of the forked repositories. You can have a look here at how this was calculated.

We can see that HasGeek is doing fairly good here, having more share of source repositories than forked. A large portion of Flipkart and Freshdesk's repositories are inactive-forked.

##3. Development Activity

All the involved organizations have somewhat for the community; projects born as solutions of some problems, projects born in some hackathons and so on. They're gradually building things to enhance their infrastucture and market position.

###3.1. Repository Creation

This section deals with creation of repositories of all the organizations. Repository-Creation

  • You can see that Urban Ladder, HasGeek and Exotel created their first repository almost at the same time of their GitHub organization creation.
  • ShepHertz, HasGeek and Flipkart have kinda continuous repository creation events through out the timeline.

Again, if you think that it's general knowledge, then let me show you the magic.

Go back and watch the image carefully and you'll notice something weird for HackerEarth, are you?

Yes! you see there, HackerEarth's first repository was created before creation of their GitHub organization itself. How is this even possible?

Well! ladies and gentlemen, this is possible. Let me introduce a new theory in support of this.

HackerEarth's oldest repository in the time series is django-storages. It's the same repository, which is creating the confusion. But the fact is that this repository was initially forked by HackerEarth's Co-founder Vivek on his GitHub account. After the creation of a separate organization for HackerEarth, he merged that repository to the organization.

That's why this repository's creation date is before creation of their organization. Well! again, I need Vivek's approval on this.

###3.2. Commit Activity

This section deals with the commit activity of all the organizations. Commit-Activity

This plot shows weekly commit activity of all the organizations. This is pretty much mixed-up though, but this was the only plot-type in my mind at the time, when I was developing this.

You can see a relatively more development activity in the start of the year.

Flipkart development team keeps a fork of the linux, it's not a forked repo though. I removed its activities because this was making the plot even more cluttered. You can check that plot also, though.

##4. Technology Stack

Different organization are working in different fields of the technology; be it medical services, developer events, online shopping, food, cloud services, online payments etc., so they're encountering different problems in the path and managing it accordingly.

###4.1. Programming Languages in Production

This section deals with use of different programming languages in the involved organizations's infrastructure. Language-Use

This plot uses colors from GitHub's linguist for different programming languages.

This helps us understanding tech-stack of all the organizations.

  • Flipkart uses Java, HasGeek uses Python, Practo uses PHP and Freshdesk uses Ruby as their major programming language.
  • Organizations have started using non-traditional languages like Lua, Erlang and Scala etc.
  • ShepHertz uses maximum no. of programming languages(14), in their quest to serve all in-demand programming language in their service.

###4.2. Field of work

This section deals with the fields, different organization are working in.

To calculate the results, I have used repository names and their description here. Actually I wanted to have relative sharing in fields of working of all the organizations.

So, initially, my plan was to use Latent Dirichlet Allocation on the repository-description-text corpus for Topic Modeling.

Where I had use concatenated repository descriptions of organizations as a document but then I droped this idea because of asymmetrical repository distribution. It was resulting in a corpus of 14 documents only (Zomato excluded).

You can have a brief knowledge about LDA, here.

Then I changed the plan and moved towards Naive Bayes Classifier and used word frequencies only.

You can check the classifier topic results based on probability or frequency, here.

So, some of the topic results from Classifier for organizations are :

  • Cucumbertown : Django, Gearman, Email, Commit, Notifier
  • Exotel : Audio, IVR, Music, SMS
  • Flipkart : REST API's, MySQL, lucence, Redis, HTTP proxy, load balancer
  • Freshdesk : Databases, Rails, API, Websockets, Socket.io, YUI, Resque
  • HasGeek : Workshop, Lastuser, App management, TV, Job, GitHub
  • HackerEarth : Django, API documentation and clients, extensions and editors
  • Instamojo : API clients, Wordpress, Frameworks, Huxley
  • Myntra : iOS, Cocoa, Android, ElasticSearch, Docker, Librato
  • MySmartPrice : Technology Blog, Gearman workers, Cookbook
  • Practo : OpenID, Flask, Sentry, Symfony, Raven, Mail clients, Messages
  • ShepHertz : App42, PaaS, SDK, API clients, MongoDB, MySQL, Redis
  • WebEngage : Message, API, Website, Speech
  • Wingify : Angular.js, DOM, RabbitMQ, iOS, Data, Bootstrap, VWO

Here we can see that Flipkart's stack includes things related to distributed computing, Networking, Databases on the other hand Wingify's stack includes things related to Frontend, Data, Networking.


So, this is it. Open Source Presence Infographic of Indian Startups. octocat.

If you're thinking that santa helped me in all this; then you are wrong, my friend. I was all alone everytime, thinking about it, collecting the data, managing R source files in Rstudio, writing Python for it and all that.

If you're feeling that you can do something much more awesome than this.

You can do whatever you want; It's hosted on GitHub, pravj/ospi.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].