A Gym in the Metaverse

A proof-of-concept approach on Metaverse technologies


Metaverse is a concept that raised a lot of attention in recent years, specifically after Mark Zuckerberg announced the Facebook name change to Meta in 2021. In this article we will provide an overview on this theme and share our experiences with a proof-of-concept (PoC) approach for a gym app using Metaverse-related technologies, such as Unity and MediaPipe.


What is the Metaverse?

There is a lot of confusion regarding the metaverse and a lot of definitions. According to Gartner “The Metaverse is a collective virtual open space, created by the convergence of virtually enhanced physical and digital reality. It is physically persistent and provides enhanced immersive experiences.” [1]. With this definition in mind, there are some key concepts we need to explore.

A common mistake people tend to make when approaching this theme is to name any 3D Environment you can immerse yourself in as metaverse. Although these environments could, in the future, be part of the metaverse, they are not the metaverse themselves. The metaverse will indeed allow people to immerse themselves in using a variety of devices, but it will also allow interoperability between these environments. A simple example that might help make this tangible is: imagine if you could bring your “Counter-Strike” skin to “Fortnite” (both FPS online games).

Another strong concept of the metaverse is about decentralization, the idea is that no company will own it, as no company owns the internet. However, the huge investments made by big tech companies [2], such as Meta, Microsoft, Google and Nvidia, point in the opposite direction of complete decentralization.

Think of the metaverse as a 3D layer of the Internet, that will provide customers with a set of new and disruptive experiences although it is in very early stages, as companies experiment with this concept in proprietary siloed environments. Imagine yourself finishing work and then meeting up with your family, that live in another country, for a spinning class at the gym — this is something that could happen in the Metaverse.

What technologies and skills can help shape the Metaverse?

There are a lot of technologies and skills that are consistently related to the Metaverse concept. In this section we will share some of them, including those we took a hands-on practice during our PoC.

  • 3D Modelling: 3D modelling will be a key factor in creating immersive environments, as already is for many industries such as gaming and movies. Tools like Unity and Unreal Engine, that facilitate this task, are already being used by companies developing metaverse games (Decentraland and The Sandbox for example). For the PoC in this article we worked with the Unity Engine.
  • XR: XR (Extended Reality) will definitely play a big role in the metaverse as it allows deep immersion for the user. Still XR will not be a pre-requisite, as the metaverse is expected to be highly accessible, and XR limitations regarding, for example, access to the required hardware or constraints affecting people with disabilities could limit that.
  • Pose Detection: Pose detection is a common theme in Computer Vision. It is not commonly related to the metaverse but is a technology that can enhance and facilitate interactions in a virtual world. It is a good alternative to track users’ movements and expressions without the need for expensive XR equipment, as it is possible to use the computer or cellphone cameras. This is one of the technologies that we explored in our PoC and we will provide more information later in this article.
  • Blockchain: Blockchain is the technology that enables cryptocurrencies, NFTs and Smart Contracts. For its distributed and decentralized characteristics, blockchain is commonly named as a foundation for the metaverse, possibly enabling the concept of digital asset ownership. Although blockchain contributes to one of the key aspects of the metaverse, blockchain may not be required to deliver the metaverse. Big tech companies are already investing in technology enablers for the metaverse, such as Microsoft Mesh and NVIDIA Omniverse, but they are currently not mentioning nor promoting Blockchain capabilities.
  • UI/UX Design: Designing for Virtual Reality follows the same user experience (UX) and user interface (UI) principles as designing for desktop, mobile, or any other commonly used platform. It means that in order to design for VR, AR, or XR, you must follow the usual design process, which includes gathering business requirements, defining the scope, prototyping, and testing it with users. For instance, when creating for VR, spatial awareness is helpful, and it's a good idea to watch the real world, which is actually the basis for 3D. Even though, methods and best practices for XR are still being discovered, it is important to highlight that when designing for metaverse the user interface should be as natural and fluid as possible and the immersive experience should be intuitive, delightful and safe.

How ready are we for the Metaverse?

Even with great advances in recent years, many of these technologies have not yet reached a mature state and commercial acceptance is primarily by early adopters. XR for example, raises concerns about the collection and usage of the users’ data [3], in addition to the accessibility challenges already mentioned. Another example is concerns related to the performance of blockchain on large-scale complex processing, with transactions in some cases (depending on the chain size), taking days to complete [4].

Multiple evolving technologies are not the only issue requiring attention. As the metaverse is meant to be unique and provide interconnection between different environments or experiences, there is a need for standardization to be addressed. One interesting initiative that aims to help in this process is the Metaverse Standards Forum. It is recently formed (first meetings started on July 2022) and has already gathered the membership of hundreds of large global companies, including Meta and Google, across a variety of industries.

Governance and legal issues also must be investigated to provide proper consumer rights protection as well as regulatory and tax implications over a virtual digital assets economy [5][6].

To better understand the metaverse, including some of the technologies, their issues and applicability, we decided to develop a proof-of-concept by creating a simple immersive virtual experience.

A Gym in the Metaverse

Finding means to have a healthy way of life and exercising regularly is a constant issue for a lot of people. Many of us felt the difficulties that the COVID pandemic imposed on our lives, as the restrictions caused gyms to close and interrupted group sports activities, forcing us to look for alternatives. Some research [7] also points out that group exercises and exercise communities can help boost people’s participation by providing regular schedules and motivation.

For the PoC we chose the case of workforce exercise referred to as “labor gymnastics” programs frequently offered at companies. The basic idea is to provide an app that both trainer and participants can use to participate in an exercise session. The app would allow the participants to move around and visualize the trainer's movements from different angles. Instructor's movements will be captured from the instructor's mobile phone camera and projected to the students’ cellphones as an avatar.

Using cameras to track movements is not the best method, but for this proof of concept, the goal was to explore different technologies.

To achieve that, there were a series of challenges that needed to be addressed. The following list provides a description of each of those challenges, the solution we used and some alternatives.

The 3D environment

We wanted to create a simple Gym environment where both instructor and participants would be located, during the exercise class. For this job we chose Unity.

Unity is a development framework and engine that provides many ready-to-be-used features and libraries that facilitate both the development of the 3D environment as well as the application interface and interconnection.

Unity provides an asset store where you can browse for different tools, addons, templates and models for your application. Unity also provides an easy way to import 3D models. For the Gym, we used a free model available on Sketchfab. As for the trainer and participants' models, we used models created on readyplayer.me. Ready Player Me is a website that allows users to upload photos and instantly create avatars that can be exported and used on different platforms.

Tracking and rendering instructor’s movements

We needed to be able to render the trainer’s avatar inside the Gym and change the body position accordingly to the movements captured by the trainer’s cellphone camera. To help us with that we used MediaPipe an open-source cross-platform customizable Machine Learning solution. Media Pipe has algorithms ready for many live or streaming video recognition capabilities such as face detection, object detection, instant motion tracking, pose detection, and many others.

To integrate MediaPipe inside Unity we used an open-source plugin. Although the plugin facilitates the job, we encountered some issues regarding correctly tracking the instructor's movements. Problems like:

  • Making the feet actually touch the floor
  • Calculating what is the “forward” direction of face, body and hands
  • Filtering MediaPipe's output (Since the marks are not always stable and can change position abruptly from one frame to another)
  • Rigging the 3D model to be able to map the movements correctly
  • Depth calculations
  • Body parts overlapping

Handling session between instructor and participants

We started evaluating the possibility to use WebRTC to transmit the 3D instructor and virtual room to the participants’ application. Unity has an opensource library called Unity Render Streaming that allows that, including the possibility for the clients to interact with the rendering scene. The downside of this approach is that, since all the rendering is happening on the host side, the number of “players” connected to the host is limited by the host processing capacity. As for our application, the idea is to connect many users to the session, we decided to take a different approach, utilizing a netcode library: Mirror. However, Unity Render Streaming might be a good approach if the idea is to host small sessions and drastically reduce the hardware requirements on the client side.

Mirror is a high-level network library for Unity, compatible with different low-level transports such as UDP, TCP and WebSocket. It is also compatible with vendor-specific transport: Occulus P2P and FizzySteam for example. Mirror provides ways to quickly set up different network models, supporting dedicated server, client hosted and peer-to-peer models. Other commonly used netcode libraries are: DarkRift2, Photon PUN and Photon Quantun 2. Mirror was used to setup a client hosted model where the host of the online session is the Teacher´s cellphone.

Spatial Voice Chat

Spatial Voice chat not only allows users to chat with each other within the application or game, but also takes into consideration the user’s location, changing sound direction and volume accordingly. We wanted to incorporate this functionality in the application as it is an important factor for the user’s immersive experience inside our environment. Mirror alone does not provide tools to incorporate voice or message text within the application. There are a lot of assets in the Unity Asset Store that can be used to provide this functionality: Photon Voice, Agora, Vivox, Dissonance and many others. Dissonance has built-in integration with Mirror, representing a good choice if you choose to work with Mirror in your application. However, since Dissonance is a paid asset, in the Unity Asset Store, we decided to use Agora in our PoC.


In general, the technologies used proved to be ready to deliver a good and immersive experience. Without too much effort we were able to create a simple Gym application. Unity and its assets (Mirror and Agora) abstract a lot of the complexity behind the features, making the development process easier, allowing more configuration and less coding. For example, to include a video on a TV inside the Gym we just had to drag the video file inside the TV model.

MediaPipe, even with the difficulties reported above, proved to be fairly accurate when it came to identifying the instructor's movements, and it tends to get better as BlazePose GHUM Holistic, a new algorithm focused on fitness tracking, is being released. On our tests, MediaPipe was able to properly recognize a variety of positions as demonstrated on the video below. It is important to note that we did a basic mapping of the MediaPipe information to the avatar model, and we also did not implement finger tracking, so some inconsistencies in the movements were expected.





MediaPipe still has some issues related to the position of the person in relation to the camera (works best with the person facing the camera), image depth and overlap. The videos below demonstrate some of these issues:



In general, MediaPipe, and probably others pose detection solutions, proved to be a reasonable and cheaper alternative to track user’s movements and replicate those movements in the virtual world.

On the immersion side, the ability to view the instructor's movements from different angles, and even approach to get a better view makes it easier to understand and replicate it.





The spatial sound, already widely used in games, also helps to enhance immersion by simulating the position of the audio source inside the 3d environment. It is also easy to setup on Unity as you just need to enable it and configure the audio source and range. Unity also allows you to setup environment sounds that can be heard everywhere in the scene. The video below demonstrates these two features.


In the end, being together in the virtual environment, walking around and interacting with each other through gestures and voice chat makes the session feel more real.



These simple features represent a step forward over a conventional exercise video class and have the potential to bring a new enhanced experience, which is what the metaverse aims to provide.

Evolving the Application

There are a lot of different features that could be incorporated into the application to provide enhanced usability and take a step closer to the metaverse experience. Some of these features are:

  • VR Headset integration: VR would allow complete immersion into the virtual gym. With the participant’s movements, and camera positioning being controlled by the combination of VR equipment and external camera, both participants and trainer could have a more realistic experience, much closer to the experience they would have if they were all in the same room doing exercises. VR equipment also has better capability to track the users’ movements.
  • Smart Tokens and cryptocurrency: Smart Tokens and cryptocurrency backed by blockchain could be used to monetize interactions within the application. Both trainer and students could be rewarded with tokens for classes they participated in. These tokens could be used to buy in-app content such as accessories to customize your avatar, or even to hire a trainer for private or group activities within the gym metaverse.
  • NFTs: NFT´s could be used to register ownership of avatar customization elements, such as shoes and clothes. They could also be used to register ownership of real-world gym tools. For example, someone could buy a Spinning Bicycle in the real world and get an NFT’s that allows him to register it on the gym metaverse. These gym tools NFT’s could be used to unlock additional features inside the gym metaverse, enabling the user to participate in a virtual spinning class for example.
  • Social Media: The app could also provide an integrated social media, allowing users to interact with each other, create online group sessions or even hire professors for scheduled group sessions.
  • Simultaneous Translation: With speech-to-text, translation services and text-to-speech getting better every day, it is possible to implement simultaneous voice translation to allow speakers from different languages to interact effectively.


Many of the technologies associated with the metaverse already have the capacity to deliver new immersive experiences, as we were able to evaluate in our proof-of-concept without expending much effort. Many companies are already using these technologies to provide new experiences, either within their products or exploring the metaverse concept.

However, some fundamental aspects related to interoperability, decentralization and security of the metaverse are still in early stages, indicating that the true metaverse is still on the horizon.



[1] https://www.gartner.com/en/articles/what-is-a-metaverse

[2] https://www.makeuseof.com/companies-investing-in-metaverse/

[3] https://beyondstandards.ieee.org/ethical-considerations-of-extended-reality-xr/

[4] https://www.datasciencecentral.com/blockchain-wont-save-the-metaverse/

[5] https://www.forbes.com/sites/martinboyd/2022/05/16/regulating-the-metaverse-can-we-govern-the-ungovernable/?sh=68698e3a1961

[6] https://www.financialexpress.com/digital-currency/regulating-the-metaverse-what-should-we-do-to-protect-consumer-interest/2541306/

[7] https://www.verywellfit.com/group-fitness-benefits-5215497


Videos used during the POC tests were obtained from pexels.



This piece was written by Guilherme Carrenho, Innovation Architect at Encora, and João Pedro São Gregorio Silva, Software Developer at Encora. Thanks to João Caleffi, Flávia Negrão, Marina Busato and Kathleen McCabe for reviews and insights.


Share this post

Table of Contents