In the past year, we have worked hard on the underlying structure and functioning of Gaia, in particular to improve stability. The reason? Gaia manages millions of photos and needed a major upgrade to keep up. In this blog we explain everything about the problems we encountered, how we solved them and what we continue to do to keep your content safe.
First, a little bit of context about the processing of your media within the platform. When users upload media (images or video), they first go through our Media Processor. This process ensures that images are stored in various sizes (compressed). The same goes for videos. This is necessary to quickly display media in the overview and to enable quick navigation between photos. The Media Processor handles these files once and then forwards them to storage.
Storage at a data center located in the Netherlands was a must during the development of the platform. In addition, we wanted to keep everything in-house so that we can gradually adjust the storage to our wishes and switch quickly in case of any problems. We have opted for our own solution in which we have set up storage servers with Linux and a self-developed storage driver. This driver ensures that incoming files are stored securely and are only accessible by requests from Gaia itself.
The storage servers had two major flaws: First, the bandwidth was not only limited, but also shared. This meant that we could not always use the full gigabit connection and with the growing use, this became an increasingly important bottleneck. Second, we had improved Gaia’s infrastructure that allowed us to process more images at once. However, this led to the situation where all this media arrived at the storage servers at almost the same time, which caused additional stress on the network. When we then also started offering video via Gaia, this problem was greatly amplified and the first problems started to occur.
Since the Media Processor also immediately processes the files during an upload, this meant that uploads were sometimes not or not fully completed due to the limited bandwidth (or other matters). Downloading media also relies on the same systems and because of this we continued to experience glitches which as Gaia grew in size also became more difficult to resolve.
At the beginning of this year, we addressed the entire infrastructure to make Gaia future-proof and provide users with a more stable and faster environment. The platform itself has now become scalable, making it easier for us to scale up during busy periods. Using connections of up to 20 Gigabit, users can now upload and download media at 2 to 4 Gigabytes per second.
We have exchanged the storage servers for Azure block storage in the Netherlands. In addition to good service agreements, media is now also safely backed up in three different places (also outside Azure to prevent dependency). In addition, Gaia can now display media even faster. The Media Processor has also been updated and now permanently saves the original file and uses that file to compress images or transcode videos. This ensures that we can always accept the uploads and (in the event of issues) can also deal with them later.
Finally, the platform has been extensively tested and we continue to monitor its performance closely. As a result, it is time for some new functionalities within the platform itself, for which we now fortunately have plenty of room to build further.