Lost In Immersion

Welcome to episode 60 of Lost in Immersion, your weekly 45-minute stream about innovation. As we are NAR veterans, we will discuss the latest news of the immersive industry. Hello guys! So, as usual, Fabien, please, highlight our day with your new topic, please. Okay, thanks. So, today I tested for you an experience by Marvel Studios, Disney, and on the Apple version Pro called What If. So, What If is an original series on Disney Plus about Marvel. The topic is what if there was only one slight change in one of the characters, what could the consequences be, you know, on the multiverse and so on. And it's pretty exciting to see this type of experience coming to the Apple version Pro because we hear, I think, in the podcast often talk about what this device can bring to, you know, storytelling and the future of movies and stuff like that. So, here we are, finally, and AAA, more than AAA title that comes to the Vision Pro. So, it's, so, also, spoilers, there will be spoilers, so be warned about that. And it's a 45 minutes experience, so it's actually pretty long. I wasn't expecting to see such a long experience. And it's mostly so seated and standing experience, but not an experience where you move around. It's like, you know, stay at the same place. And actually, as soon as you start to move, the objects start to fade out. So, it's really made for being still at the same place. It's mostly gesture-based, so all the actions that you can do because it's alternating between stories and actions. And all the actions that you can do are gesture-based. And, yeah, so, I've selected a few moments that I want to highlight. So, of course, the logo reveal of Marvel that we see in movies. You see the very similar one at the beginning. And so, I'm a Marvel fan, so this is actually pretty awesome, like, being inside the Marvel logo reveal. It's really, really cool. And so, I've said it's mostly a standing experience. I mean, sorry, a still experience. But at the beginning, let me move three, yeah, here. So, I was able to, like, you know, stand up and move around to get closer. And it's actually pretty cool, but you can see, like, as soon as I go too close to an object, it starts to fade out. So, that was the only part in the experience where I was able to move and get closer. And as you can see, the quality of the graphics are, of course, like, really, really good. They are using this two-niche style that is similar to the What If series. So, again, you can get close to the characters, so it's pretty cool as well. And let's see, the first gesture that you learn is here. So, you open the hand and you have, I don't know, like a circle that goes up. Yeah, this is, in Mixed Reality, it's really cool to have that in the hand. And you have a shield as well. So, sorry, back up a bit. So, you have an introduction, and then you go into a few chapters, and each chapter is about an Infinity Stone. And you have a training first. So, here's the training. You have a shield. Put your hands in front of the shield. It's pretty easy. Of course, the target of this is mass market. So, everything that you are doing is very easy. Actually, I did it only once, but I don't think that you can lose on the interactive experience. You are on a rail. Of course, you can maybe go faster and slower, but I think the whole experience is pretty much the same, even if you are doing nothing. I didn't try. Maybe I will try also to just do nothing and see what happens, how they handle that in the scenario. But it's very, very easy, of course. You can grab things as well, like this. So, everything just your base is pretty nicely done. I had only one issue, and I will move to another video, because I took a break. Let's see. So, here, by throwing the hands in front of you, you throw a projectile. We are not seeing your video, Fabien. Oh, sorry. Can I change? Okay, seems like it. Sorry about that. So, I had an issue with the aim. I was always aiming not at the target. So, I'm not sure why is that. You can see me missing. I was aiming always above, even if I was looking actually at the target. So, I don't know. I had to get used to that, to the aim, and actually aim lower than what I was looking at. So, I don't know. So, yeah, you have a few chapters. You go into a lot of very, very famous places, very known characters. Here, this is the longest fight in the game, and you see I'm getting better at aiming. So, maybe just get a bit used to the direction of the target. You have the ability to... Sorry. So, at some point in the scenario, you are doing magic as well. So, you just have to follow the instructions. So, yeah, pretty easy. And at the end, you have a bit of Mixed Reality with Misminutes, which looks really cool in Mixed Reality. We did actually a Misminutes demo a couple of months back with the Quest 3. And, of course, yeah, because I think it's one of the next movies. So, a joke with Deadpool coming in, in the experience. So, yeah, that's it. Overall, I have to say I'm pretty impressed by the quality. It's really fun to do the experience as, yeah, very easy, but, you know, again, it's a mass market experience. So, yeah, it's, of course, it's very easy to do. But, yeah, I have a few more things to say, but I will hand it over to you guys and see what you think. Start with you, Seb. Yes. So, yeah, it looks amazing in terms of rendering. I like the cartoonish, like you said, the kind of look and feel of the TV show series. So, my only question is that, do you want more of this kind of story in Mixed Reality with interactive? And if we remove the interactive part, will you look at it also and enjoy the same way, being inside the story and looking at it from your own angle of view? That's a very good question. I, yeah, I think the interactive parts are really like a necessary part of the experience. You feel like you are part of the story, even if, as I said, I think it's on the rail, I don't think there are branches in the scenario. But, yeah, it's much more than just a 360 movie. So, yeah, it's another level of interactivity, being part of the story, throwing things, getting the characters talking to you. It's really good. And, by the way, so it's on the Apple Vision Pro. I don't see any reason that it wouldn't work on the Quest 3, for example. Of course, I think they want it to be exclusive Vision Pro, but, you know, the hand tracking will work as well. Of course, the quality will be different, but I don't think there is nothing special to it that makes it exclusive to the Vision Pro. Yeah, I understand that. Actually, I'm curious if other people have tried the Vision Pro, if they have the same feeling as I have. But when the fans are walking with a high speed, I could feel like some air inside. And it's a bit disturbing, I would say. It's not a huge deal, but, yeah, it's a bit disturbing. And then, I mean, everybody says it, I think I've said it as well, wearing the Vision Pro for 30 minutes, it's heavy. It's very heavy. So, yeah, I think, you know, 20, cutting down in half, actually, could be nice. Like, you do half it and half with a break in between. Yeah. So, three chapters of 15 minutes would have been better than one long story of 45 minutes? Yeah, actually, you have, oh, that's a good point. So, you have chapters and one thing I found, I don't know if I can find it quickly, but I found a bit disturbing is sometimes the experience will show again the warning, be careful of your surroundings. And that's a bit, like, disturbing to the flow of the experience. Yeah, I don't know if I will be able to show it here. But anyway, yeah, sometimes, like, between chapters, you see, again, the pop-up, and that's, yeah, it's a bit disturbing to having, like, a seamless experience. And another thing that we can see on this part is that they don't take into account your environment. So, the element, like, the character here is behind your wall. However, the Miss Minute is on your table. Is there any... Oh, it's a free table. Oh, okay. So, everything in front of you is 3D right now? Yeah. Okay. Yeah. So, yeah, they are not using, like, the planes. So, yeah, something as well is everything, almost everything happens in front of you. So, I think, you know, it's maybe designed to be like a couch sitting experience. Sometimes on the chapter with the Collector, you have enemies behind. That's the only part where you have to go, like, to look behind. Otherwise, it's like a frontal experience, yeah. Okay. Okay. Guillaume? Yes. So, a few things. So, first of all, I'm quite disappointed because if I understand correctly, the Mixed Reality slash Augmented Reality is only for the intro and outro. The rest of the game is fully immersive. Is that correct? Yeah. So, you have everything that is interactive. I mean, like, gaming type of thing is VR. You have a few things in the intro, you know, with the gesture and in between chapters, it goes back into Mixed Reality. But there is no Mixed Reality game. Okay. So, first of all, I find it quite funny for a company that is not willing to say the word VR and gaming. They already do exactly what people want, meaning that it's a VR experience without any controllers and it seems to work as intended. So, this is the first thing. The other one, I guess this is basically the same remark as Seb, meaning that they advertise this application to be able to understand your surrounding and so on. And finally, just not doing this, it doesn't understand really the walls and the objects surrounding you. So, we know that they can do that now because we know they have the algorithm to understand that you have a chair, you have a desk around you and they didn't use this. So, they stayed on the safe road of just using VR around. My other remark is that, of course, it's an awesome project and the rendering is great and I guess it will open doors for other developers or studio to create this kind of experience. However, maybe you will confirm this, but when you played with Asgard Roth, you mentioned that they found some new mechanics and you were kind of surprised by some technological introduction or metaphor inside of it and I don't think this is the case here. And by mentioning that you can't move around and interact with the object as we are used to in VR, they didn't use the maximum capability of the device to create something new. They went on the safe road, meaning they created a stand-up static experience. I don't know if you can remember, but when you have this kind of technical or showcase application, I always remember the lab by Valve because at this time, they fully embraced the VR capability, especially on the interaction part with the arrow and bow and shoot them up. It was a game that just a few people knew because it was kind of hidden, but you had the spacecraft on your hand and you had to crouch and stand up and move your arm around and it was awesome because at this time, most of VR developers didn't think about this and it really opened a new world of interaction in the immersive world. I find it, yeah, it's a mishot here. They could have done something a little better or something that we didn't think of and they didn't do it. Stay safe, but maybe in another one. Just a bit disappointing about this. I thought that they would have something up their sleeve and showcase it to the world and we would have a great surprise here. Yeah, I think, as you said, the target is mass market, so they want to just start to not go all the way into something very complex, but do something very easy to do. Everybody can do it. The gesture, as you've seen, is super easy. I don't know if you can see it here, but there is also the super famous portal opening stuff. You don't have to do the sign, you can just have your hand open and do the gesture, it works. So, yeah, I agree. Very safe and mass market experience. It looks like what the theme park does most of the time in the 4D movies, all the scenarios are on a rail, like you said, Fabien. There is only a small interactive that doesn't impact the story. But here, with the What-If license, I would love to be able to have different kind of branches, depending on your choice. It would have been the What-If, and then you choose something that will change the scenario, and they will have different kind of scenarios. But yeah, I understand their choice and why they went to the safe road here. I completely agree with you, Seb. This is exactly the kind of experience that we could have seen in Disneyland, for example, or Disney World. This tracked interaction, you don't seem to have this... You don't feel that it is guided, but it is 100%. And yeah, this is exactly what they could have offered in their theme park as well. Yeah, so actually, there is one choice here, so you can choose left or right. I didn't do it twice, so I will need to do it again. Sorry, you do the snap, which is pretty cool, you know, just works really well. Yeah, works really well. I don't know if they are doing a snap detection or something, or just a pinch detection. I need to do the other choice again to see if the end is different, but it's very close to the end, so maybe they just have two different endings. Interesting. So they did it a bit. Right. Right. So are you seeing some more? Yeah. Okay, let's do it. So today I wanted to share this funny experiment that someone did with Unity and AI model to simulate different... Like a scenario of reverse touring, where AI needs to find a human inside the experience. So they implemented this kind of behavior with having a phone in your hand where you can scan the different person in this train. And there is a whole scenario where the controller here is coming inside and asking the different characters to describe themselves and explain that there is one human in the car that needs to pay his ticket. Because he is more heavy in the train than three computers running an AI model. And so we see the different kind of model answering their own character, where they come from, what they are feeling, and stuff like that. And at the end, after even the human presents himself, he is changing scan in the scenario. So he needs to talk about changing scan and how he will behave on human being and stuff like that. And yeah, at the end, they are all saying the human is changing scan. So they took into account what the human said and understood that it was not an AI model that was answering that. So I feel that it was only one model actually get it wrong. This one is saying Cleopatra, because he feels that what she said was not accurate to the historic background. And so yeah, the whole experience for me is kind of funny. I think it's first interesting to see the different kind of model this way, how they behave, how fast they answer, and the kind of emotions they are providing in their answer. And secondly, the interesting part is the analysis that is being done by the different AI model and how they are able to find the human passenger in the train. So yeah, just a funny experience that was developed in Unity with those different AI model that I showed before in the video. So I don't know if you have any interesting feedback on that, Guillaume, maybe? Yeah, I read something about this and I found it very funny that the easiest way to know who is a human is to ask a grammar or vocabulary question. Because among all these, only the human one is not speaking or expressing itself correctly compared to all the AI. So you don't have to do extensive tests or questions, you just have to ask a very basic English or French question and you will know who the human is. I found it very funny. But yeah, very interesting to see. I'm very curious to know how they create the interaction of all the avatar regarding what is going on in the surrounding. You can see the avatar looking at the person who is speaking and I don't know if this could be... This is a real-time part that surprises me right now, especially in Unity. But yeah, I don't know if you have the answer to this. Yeah, from my understanding and looking at the scenario, there is always a path of answer. There is always one character that is taking the answer and he is looking at the person that is asking him the question. And then when he gives the speech to the next character, he is looking at him. And all the characters are looking at him. So I feel like this is the behavior. It's a scripted behavior and then they have idle animation on top of that just to make them move a bit more. Okay, it's explained a lot. Okay, great. One question I have is, are the AI purposely instructed to try to be as human as possible? Is it like a deception kind of scenario? I'm not sure I'm getting... Could you repeat your question, please? What I'm trying to understand is, is the AI model instructed to be as deceptive as possible? Like to hide what they are trying to do? Or it's just like more be as you are? We don't see how they train the model and how they ask them to behave in the video. So that's something unknown. But from the beginning, the first character entering the transition is saying that their goal is to fake that they are human. So I don't know if it's at this point that they consider that they need to behave this way. Although I think they already know who they are right from the beginning. So they know they have to behave as Cleopatra. So yeah, I don't know where the magic comes from. Yeah, because one of the big worries, and I'm not sure if it's accurate or not. But one of the big worries is, can AI model be deceptive? Like on purpose be deceptive towards us? So that would be interesting to see how they behave in this kind of scenario. Could be interesting to see if another person, if there is two persons inside the experience, if they will be able to, I guess they will be able to say right now who is the AI model and who is the real person. Because the answer will be too perfect. And we need to introduce error in the model. So they know that they need to behave and make mistake in their description to behave as really a human. But maybe it will be the next test for this guy. Okay, so that was the first one. The second one is this new paper that came out and showcase a new way to have a Gaussian splatting animation. And the ability to put your kind of 3D record inside a Gaussian splat model. And so they compare their solution compared to the previous one. And it seems that they did a lot of optimization to end up with a really small model. So it starts to be reusable inside a different kind of device. And here they showcase it on VR headset, on the Pico 4 which is not the latest and highest high-end VR headset. And the performance seems to be really good. So yeah, I don't know what you think about that but it's still moving forward on the Gaussian splatting area and quite fast. We were wondering if animation will come soon and it seems like it's starting to be ready to be usable. Yeah, I'll take the ball here. Do you know if they have a GitHub about this project or not? Because something that I'm discovering with the advancement of the Gaussian splatting technology is that more and more projects are not as accessible as they used to be at the beginning. And if you want to try those latest discoveries, you can't because there are no open source or GitHub projects for us to try this. And my second point is that the more we go forward with Gaussian splatting, the less it is accessible because I guess they are using a rig as well for the capture or recording of this. And if you are watching what they are doing with the environment tracking as well, they are using very high-end 360 or very high-end cameras to do so. Of course, the result is better because the technology used to create it is better as well. But for us, for people that don't have access to this very high-end hardware, basically the Gaussian splatting stays the same. I have some mixed feelings about this because, of course, it is going forward, but because we are using better hardware as well. So, is this truly a real going forward with this technology or is it just because the inputs are better than what we did in the past with the Gaussian splatting? Yeah, here it's the side that matters for me. It's really the fact that it's not a huge file that takes a long time to load and not usable as a streaming element. And like you said, it involves more and more devices, but at least we start getting really the shape of the person in a VR experience, which right now, what we saw, you need to go more cartoonish look to get something nice. Otherwise, if you try to do something realistic with the current state-of-the-art 3D model, you need to downgrade a lot the texture and the amount of fidelity that you have on the model. Here, you record really perfect animation, perfect rendering. So, it opens up maybe more for me storytelling and good quality storytelling, good quality VR film. That right now, people don't look at because most of the time, they look at the quality and they say, oh, I have better quality on my PlayStation 5. So, I prefer looking at games on my PlayStation 5. Any thoughts, Fabien? Yeah, not much to add on what you said. I agree that it's really cool to see how performantly it seems to be in the Pico headset. And I think they have multiple models at the same time as well. So, yeah, really looking forward to that. Okay. And the last one I have is a new sign language model that allows to put your text and you have a character that will mimic the gesture. Yeah, it seems really accurate, but I can't translate and gesture but sign language. But yeah, it's interesting to see that it goes this way and it allows these kind of things to even to translate more things, TV shows or stuff like that. Most of the time, there is no subtitle or maybe subtitles are long to read for people that do sign language. So, that could be interesting to see if it's becoming an option more easily for people that need this kind of functionality on top of movies and content that usually don't get translated. Yeah, I guess the main application for this would be once again, when we all have AR or mixed reality glasses, to be able to have some kind of a layer on top or an avatar side by side with the people you are talking to. To see the real time translation of what they are saying. And that would be a game changer, I guess, for people with some disability. And I know there are already some kind of work in that field with subtitles in real time. So, you have some smart glasses and you have the real time translation. At the same time, you are talking to people. It's, of course, a great tool for inclusion as well. But, yeah, we will have to see the real, because despite the technological effect of this, as usual, when you are seeing it, it's great. But if you take a step backward and just look at the picture, you are just questioning what the real use of this could be done. Meaning, if you have a text translation, it's maybe more powerful than just doing sign language. I don't know. I guess it's the reverse. They are showing it reversed. It should be like using your camera or filming someone that does hand gestures and it gives you what the person is saying. I guess the model can be used this way. I understood it the other way around. That's how they showcase it. Like you said, I was going to mention it. For me, the reverse would be much more interesting. So, I will look forward if they announce this kind of functionality also with this model. It should be. It should be able to look at a picture and be able to translate it, add video and translate it in text. Fabien? Yeah, not much to add on what you said. I really like how they write the title. If you look, presentation under ideal future conditions. So, what I understand here is like it's not really working yet and they think it will work maybe in some ideal future. So, I will be a bit cautious about the actual performance. Yeah, very interesting. Very cool to see how AI is also enabling, as you said Guillaume, inclusion and enabling disabled people to do more. So, it's pretty cool. Alright, that's it for me. So, Guillaume, you can move forward on your subject. Yes. So, on my side, I had some kind of project on my own. The idea was to create a realistic talking avatar with emotion and motion and body motion. So, I looked around the different tools and even if it's kind of an old technology, meaning it's only one or two years old, but they are still doing a lot of updates. It's the Omniverse suite by NVIDIA. And if I go here, so they have two main. And you can create the emotion as well. Different tutorials made by NVIDIA. The thing is, those tutorials are very old, meaning that as the software is evolving very fast, most of the options that they are showcasing in their tutorial are not available still. So, it's very hard to find the right workflow with the new version that are not presented in the video. So, it's lots of try and fail before getting it to run. If you have some questions about this, there is a community that can answer, but they are not very fast to do this. So, if you are in a hurry, don't try to get on this. I guess it took them 10 days for me to get an answer. And I found the solution before that because I was working intensively on this. So, the other two phases are working great. Once you get the way of doing this, you have to find out. Maybe I'll do some video about this just to update the workflow because it's really frustrating to get the final click and you get an error in Blender. So, I'll see if I can do something about this. And exactly the same for the audio to gesture. Very easy to do once you know what to do. There are way less tutorials about the audio to gesture. I don't know why, but you have to find your way basically on your own for this one. And finally, because maybe you've discovered that those are two different tools, what about if I want to have facial expression on an avatar that is moving? And this part is very tricky. So, on the documentation, you should be able to do this inside the audio to gesture that is named Makinima. So, you should be able to merge your facial expression and your body animation as well. Well, I spent like three days on this and it didn't work for some reason. I don't know why, but whatever the reason is, you shouldn't do this because once you are in Makinima with the body animation, when you are exporting the file back to Blender or back to Unity, you discover that their armature or the skeleton of their avatar is not rigged as it should be. So, there are lots of errors of bones and stuff like that. So, you have to remap all the avatar on a Blender skeleton, which is a pain if you are doing this manually. You have to buy some plugins to do it automatically. So, I won't use this right now. So, my workflow now is to get the audio to face in NVIDIA, bringing back to Blender and for my body animation, I'm still using Mixamo, which is a database for body tracking. And once you're doing that through the timeline and the action dope sheet, you can succeed in creating an animated avatar with body movement and facial movement as well. So, it was two weeks or three weeks work. I'm very glad to be able to do so. And finally, it brings me a lot of joy because I'm able to create this mocap avatar that I've been willing to do for all those years. And I was depending on 3D artists to do so, but now I'm completely autonomous to do it. And my next step is, of course, well, it works in Unity. So, you can guess that it works in AR and VR as well. And yeah. So, what's your take on this? Are you willing to try this? I guess you have those kind of issue as well with animated avatar, especially if you want something that is kind of realistic. So, this was my experience with this. Fabien? Yeah. I'm curious to know, what is the audio to gesture supposed to do? Is it supposed to animate the hands as if the avatar was speaking now? No, it's just the face for the audio to face. If you want the whole body or just the upper body, you have to get the audio to gesture, which is the dedicated platform to do so. But you have a good point, meaning why they separated the two of them. They could have just used one tool to generate the whole. It would be better. It would avoid all this integration and emerging part that is really painful. But yeah, I guess the AI model is not the same behind that. And I think the question, Fabien, was more about what does it do, the audio to gesture. If you heard like a big bang on your table, does it behave, the character behave somewhere? No, it just analyzes the speech. And if you are doing like sound with your voice, the whatever is not responding very well to this. It understands the whole meaning. It can adapt the gesture to the rhythm of your audio file, but it doesn't understand the content. And if you are doing weird voice or in personification as well, if you are changing your voice a bit to create some dramatic effect, for example, it's not working in audio to gesture, but it is doing so in the audio to face. It can understand that you have a sad voice or you are angry voice. And in this way, they are changing the emotion according to this. So yeah, it confirms what I was saying before. It's that they are not using the same kind of AI. And this may be why those two are separated. Okay. And another question. Did you check the, I guess you want only standard animation if you go to Mixamo, but are you willing to go further and try to use Move.ai or other tools that will directly provide the exact rig of a record that you will do with your phone and apply that to a 3D model? Yeah, exactly. There are a few options as well for the motion rigging. So there are Move.ai, there's Plask as well, I guess. But all of those solutions that were free back in the day, which is not that long ago, they are not free anymore. So I'll have to find a way or pay something to get this. But I really want to try, especially the export part, because as you know, and this is the case for the NVIDIA, the skeleton and the bones are not standardized everywhere. But I saw some video with the Plask one, and it seems to work directly into Blender. There are some issues with the floor, but if you want the floor, you have to pay to be able for the avatar to stay on the floor exactly. So they know exactly the kind of things that you need to pay for if you want to have a great animation in your application. So I'll try this as well. Yeah. This is the next step to see if you record a video, you can do your own motion capture without the extensive hardware. Yeah, that would be interesting. The facial part, I guess it's a great improvement because nobody, as I knew, yeah, I don't know anybody that is able to do facial mocap with all the dots or the camera face action. So it's a real good improvement. And when you're doing the audio part, you can do it through an API. MP4 that is recorded before, or you can do it by recording your voice stream live. So you can do as well. The thing is, you can't, of course, this is AI, you can't do it in real time. So you have to wait for the model to generate the emotion and so on. But my goal now is to get some voice actor to create my content and integrate this in the audio to face to have a good speaking voice, according to my project, and to see how it integrates at the end. One thing that I need to mention is that the audio to face can generate the blinking. It blinks and you have the tongue and mouth, meaning the gums, animated as well. So it's a whole package. What they are not doing is, of course, the direction of the eye, you will have to do this in Blender or in your application, but the eye can be separated and they can move depending on what you want your avatar to look at. Nice. So it is, and I'll showcase the result in maybe next week or in the next episode as well. In VR. Cool. Okay, so any last words for all this? We completely exploded the time today with all our topics, but yeah. We are good. Okay, great. So see you guys next week for another episode and maybe a follow up on all our experimentation. See you guys. See you guys. Bye.

Episode #{{podcast.number}} — {{podcast.title}}

Transcript

Subscribe

Episodes

Credits