Lei Fengnet (search "Lei Feng Net" public concern) : This article was translated by data scientist Dong Fei from Erik Duindam published in the article "How I built an app with 500,000 users in 5 days on a 0 server", public number : Teacher Dong is in Silicon Valley.
Recently, Pokémon Go has swept the world and Apple and Nintendo have also made money. Many developers are still sitting around, racking their brains and thinking about how to do related development to keep up with the trend. This one is an old driver tells how to get 500,000 Pokémon GO fans' app in 5 days? (running on a 100-knife server), when it comes to MVP (minimum viable product, simplest viable product), agile development, popular framework, technology selection, and scale bottleneck, I feel that the overall actual combat is still quite strong and translates the recommendation. for everyone.
Now startups are all going to do MVP without considering the scalability of too many technologies. There is a kind of argument that emphasizes that products must be made first. As long as the business scale can withstand the end, it should not waste time and effort on technology. What you have to do is test the hypothesis, verify the market and gain attention, and then consider scaling later. But this blindness has led to some terrible failures. The following story will remind everyone.
Recently Jonathan Zarra made an app based on Pokémon GO called GoChat. In 5 days, he made the Pokémon GO chat tool with 1 million users. He immediately ran to see VC for financing and growth stories. But then GoChat hangs. It is a shame that many users are losing and the money they spend cannot be recovered.
One report said Zarra had a difficult time with 1 million users. He never thought of so many users. He built the APP through the MVP concept and later considered the scale. He was looking for a contractor to remedy some performance problems. The contractor mentioned that the cost of the server is 4,000. I assume that the hardware is 4,000, including the cost of virtual machines and network traffic.
I previously designed and built a web platform for hundreds of millions of users. I think a 4000 knife is not necessary for a chat app, even for MVP. This can only be said that the App's server technology is very bad. Building an efficient, scalable system is not easy for Millions, but it's not a question of how difficult it would be to build a system that would serve a large number of users on cloud-friendly servers. The key is to think more about setting up MVP and make the right choice.
GoSnaps: 5 Days 5 Million Users, 100 Blades a Month Server Cost
Similar to GoChat, I also developed a Pokémon Go fan app called GoSnaps last week. Share Pokémon Go screenshots and maps. It can be understood as Instagram/ Snapchat in Pokémon Go. GoSnaps rose 60,000 powder on the first day, 160,000 the next day, and 500,000 independent users five days later. It now uploads 150,000 to 200,000 snaps. There are thousands of concurrent users at any moment, and I have also built image recognition software to automatically detect whether uploaded pictures are related to Pokémon GO, as well as image resizing tools. I use Google's cloud service, a 100-month knife, and cheap Google Cloud storage to save pictures, and now it looks good.
GoChat and GoSnaps Technology ComparisonWe can compare GoChat and GoSnaps. Both APPs generate a lot of requests per second to chat or get a picture of an area of ​​the map. There is a database or GEO search based search engine, either by choosing a longitude or latitude polygon or a specific point. We use polygons and make requests when someone moves the map. These types of queries are very heavy operations in the database, especially the combination of sorting and filtering. We have hundreds of such requests every second and GoChat should be similar.
The difference is that GoChat gets and publishes a lot of chat information every second. About 600 times per second, including maps and chat information. These chat messages are small and can be accessed via a socket connection, but they need to be posted to other chatpers on a regular basis. This requires an appropriate selection, if the wrong choice is a disaster.
GoSnaps, on the other hand, needs a lot of pictures and likes per second. These snaps are stored on the server, the old snap is also relevant, and the old chat is not. The actual image file is on Google's cloud storage. The number of such request image files is not my concern. The Google Cloud is very reliable. I am concerned about the snaps requested on the map. GoSnaps has a photo recognition software to view all uploaded features and check whether the pictures are related to Pokémon GO. Then reorganize the picture and send it to cloud storage. These take into account the CPU and bandwidth are very heavy operations. It's more important to publish smaller chat messages than copying, but not so often.
The conclusion is that the two APPs are similar in terms of architectural complexity. GoChat handles more small messages but GoSnaps handles large pictures and heavier server operations. Designing the architecture of these two apps requires different approaches, but with similar complexity.
How do I build a scalable MVP in 24 hours?GoSnaps was developed as an MVP and is not a commercially mature product. It is completed within 24 hours. I used a programming marathon in the NodeJS project, MongoDB database does not have any cache. There is no Redis, Varnish, or complicated Nginx setup. This iOS app is written in Objective C and borrows some Apple Maps related program code.
I would like to consider MVP to be a functional APP, the sooner the better, regardless of the quality of the technical backend. Where is the picture? Is the MongoDB database. This does not require any configuration or code. How to query Snaps and get the most likes in the range? It is to run a MongoDB query in the entire uploaded list and make a query in a database connection. But all of this may confuse my app and functionality.
Look at the query that I need to get Snap: "find all snaps in the polygon [A, B, C, D] area, remove some discarded snaps, remove some already used, according to the number of like, legal Pokémon Go, and the ordering of freshness is fine on a small scale, but it will be a disaster under heavy load. Even if I turn the above query into an operation with only three conditions, it is scary because this is not a database. It should be used this way.The database can only query one index at a time, and this kind of GEO query can't be done.If you don't have a lot of users, no matter how you do it, you can't just say that Gochat is hanging.
How can I do it? After the heavy image recognition and reorganization of the CPU, the processed photos were uploaded to Google's cloud storage. In this way, the server and the database will not be affected when the picture is requested. Databases need to consider data, not pictures. This also saves the server. In terms of database, I divided the snaps into different types: all snaps, most popular snaps, latest snaps, latest available snaps, etc. When a snap is added and it is liked or marked as discarded, the code checks to see if it belongs to a certain set of classes and does the appropriate action. This code can be queried from a prepared set instead of running complex queries in a huge list. This divides the data logically into buckets. You can also go to sort queries to make geographic associations and simplify the selection of data.
How long have these extra things been spent? About 2 or 3 hours. Why do I have to do this first, because these are the ways I do things, I think this APP will succeed, and there is no reason to say that developing an APP is assuming it does not succeed. If my app gets some word of mouth but hangs because of poor technology, I cannot tolerate it. I put the principle of minimum visible scale into the APP . This is the difference between happiness and pain. It is also part of becoming an APP MVP.
Choose the right tool for your MVPIf I use a slower programming language or a big framework, I may need more servers. If I use PHP with Symfony, Python, Django, Ruby on Rails, I might spend the whole day solving the slowest part of the app or adding servers. These languages ​​and frameworks are great in many areas, but they are not suitable for very low-cost service budgets. Perhaps because a lot of code is used in the logic of mapping database records to the framework. This can be very harmful to the CPU. Here is an example.
As mentioned before, GoSnaps uses NodeJs as a backend language and platform, which is fast and efficient. I use Mongoose as an ORM (Object Mapping Model) to make MongoDB more like a programmer.
I'm not a Mongoose expert either, knowing that this library has a lot of code, so Mongoose has flaws. At the end of the week, our server's 4 NodeJS processes each had 90% of CPUs, which is unacceptable for 800-1000 concurrent users. I realized that Mongoose did something while acquiring data. I chose to use Mongoose's lighter features to get regular JSON instead of complex Mongoose objects.
After the change, the NodeJS process dropped to 5-10% CPU usage. So it's important to know what your code actually does, and reduce the load by 90%. Imagine if it is a complex library, such as Symfony and Doctrine, just executing this code requires a lot of CPU cores, not to mention the database becomes a bottleneck.
Choosing a simple and fast language is important for scalability unless you spend a lot on the server. Choosing a language with many useful libraries is more important because you want to build MVP quickly. NodeJS, Scala and Go are good languages ​​that meet these criteria. They provide good tools and performance. Things like PHP and Java aren't slow either, but the use of large frameworks and code libraries makes the application very cumbersome. These languages ​​are good for object-oriented development and well-tested code, but not fast enough and easy to extend. I don't want to introduce programming languages ​​here. I like Erlang personally, but I haven't used it in MVP.
My past Cloud Games venture experienceA few years ago, I founded Cloud Games, a HTML5 game operator. At the beginning, it was B2C's website. We spent a lot of energy to get users and got 1 million monthly live users a few months later. At that time, I used PHP, Symfony2, Doctrine, and MongoDB as a simple and refined infrastructure. I also worked at Spil Games, where 200 million months lived, using PHP, and later moved to Erlang. When Cloud Games grew to 100,000 monthly months, we started to focus on the server bottlenecks of Doctrine and MongoDB because of the heavy load on PHP libraries. I also set up MongoDB correctly, indexing and querying, but the server is still very difficult to handle, and later made PHP's APC cache.
Because Cloudgame.com is still mainly static. I spent a few days migrating MVP to NodeJS and Redis. The same settings, different languages. This reduces the load by 95%. There is a lot to deal with PHP libraries, but a minimal NodeJS is much easier than minimizing PHP. In particular, MongoDB and the previous code are 100% JavaScript. Just like NodeJS, PHP without frameworks and libraries is simply another language.
We need a clean environment because we are investing our own early venture team. Cloud Games later did a good job and is still based on the efficient architecture of NodeJS. We may not have managed a more economical technology structure, and we have also experienced many bumpy entrepreneurial journeys. Designing a low-cost, highly-extended architecture is critical to success.
MVP and scalability can coexistIf your app has a chance of explosive growth due to some hot spots and media exposure, be sure to consider the scalability of your MVP. The smallest feasible product and scalable technology can coexist. There is nothing worse than building a successful App but then failing for technical reasons. Pokémon Go has a lot of problems on its own, but it is unique and grows fast enough. Small start-ups are not so luxurious. Timing is everything, and the different outcomes of 1 million GoChat users and 500,000 GoSnaps users prove everything.
Reproduced please contact the authorizing and retain the source and the author, not to delete the content.