• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

I won't dance around the point, I genuinely think the reveal is coming soon.

Jensen's Japan visit, now Doug's there, all the smoke and chatter around it, apparently the factory leaks they supposedly wanted to clamp down on to avoid leaks before the reveal have seemingly started. And why would he tweet it like this unless he was teasing something slightly bigger? He doesn't tend to do this for Directs.

In no uncertain terms: I believe the reveal will be announced soon. I believe the reveal will be before February 14th, maybe because they have a Superbowl Ad to show, maybe because they want to flood Twitter with ads during it to make the most of a high traffic day.

Either way, I do actually think it's time.
Imo this post is pretty much all confirmation bias. There are many more possible interpretations to all these data points.
 
Are we really reduced to stalking an executive in its vacation (possibly)?
'Stalking' is a bit strong? It's a Tweet on his official account. I agree it doesn't indicate anything, without more context.

Also lmao at this. The ride never ends.
 
I won't dance around the point, I genuinely think the reveal is coming soon.

Jensen's Japan visit, now Doug's there, all the smoke and chatter around it, apparently the factory leaks they supposedly wanted to clamp down on to avoid leaks before the reveal have seemingly started. And why would he tweet it like this unless he was teasing something slightly bigger? He doesn't tend to do this for Directs.

In no uncertain terms: I believe the reveal will be announced soon. I believe the reveal will be before February 14th, maybe because they have a Superbowl Ad to show, maybe because they want to flood Twitter with ads during it to make the most of a high traffic day.

Either way, I do actually think it's time.
Dance more… We like song and dance!
 
Imo this post is pretty much all confirmation bias. There are many more possible interpretations to all these data points.
This thread has been running on mostly confirmation bias since September 30, 2021. Unless you happen to have a better source of fuel.
 
When he visits kyoto he often posts something from there.
Maybe he won'tt his time, but he's tweeted from kyoto more than once before.
All we know though is that he's taking off on a plane. We have no idea where to, or why.

Strikes me as very odd to assume it's an important work trip.
 
All we know though is that he's taking off on a plane. We have no idea where to, or why.

Strikes me as very odd to assume it's an important work trip.
Why? If he's going on holidays, that's his private business, and for his safety, I'd hope he'd keep it to himself!

Being a company president, teasing a business trip is normal. Teasing his own holidays is not.

Due to the nature of his job, he travels, and tweets, quite a bit. It would be unusual for Doug Bowser to post something like this for a "normal" business trip.

Plus, look at us, look at the replies! He knows what he's doing. 😆
 
All we know though is that he's taking off on a plane. We have no idea where to, or why.

Strikes me as very odd to assume it's an important work trip.
Maybe you didn't read my full post? this is what i said.
We'll know soon enough if he's going for work of pleasure. He has tweeting skiing trips before, which is pleasure but didn't tweet from the plane. This seems oddly like an important work trip. if he follows up with tweets from kyoto then it's for work.
 
I literally just saw this last night, had no idea it would become so relevant:

Wg5yptV.jpg
 
0
Whoops I did read it but not currently, I always read the word "skiing" wrong.
it's alright, early morning and i made a typo as well which may have impacted readability. meant to say "work or pleasure" but it came out as "work of please"
 
Mr. Bowser may, or may not be, going to Kyoto. Which may, or may not be, related to their next console. Heavy information, for sure.
 
The picture was actually taken with Drake's camera and posted via a built in twitter app.

If the announcement was happenieng soon, would they rather announce it before or after the investors meeting on February 7th?
 
I hope for Bowser’s sake he’s not going back to Japan. That’s a brutal flight to do on a monthly schedule.

Am I right to understand that for number 3, you would basically include in the Drake OS "patched" shaders for selected games to avoid transpiling at runtime? I assume size is not a big concern, so you could easily expand this compatibility library over time with more system updates. Or are shaders somewhat bigger in size than I imagine?
They’re not large, though games can have a lot of them. Microsoft has shipped game patches on OS like this, but I expect that Nintendo would prefer to just do it digitally simply because they don’t have a big hard drive to store a bunch of patches on.


I thought shader compilation isn't an issue for consoles since they are a fixed spec?
It will be an issue for backwards compatibility as shaders as compiled for current Switch games might not be compatible with future hardware.
So, yes games ship with compiled shaders. If you want backwards compatibility on hardware that can’t run those shaders, you have to do something. But in the case of Drake you don't actually need to recompile them. In fact, you can't, because you don't have the original shader source.

Here is a deep dive on why compiling is so freakin' slow...

Code:
LOOP 1 TO 10:
    LET X = 2.5
    PRINT X * LOOP

This dumb code runs through the numbers one to ten and prints out each one multiplied by 2.5. Now, the compiler needs to convert that code into instructions the computer can execute. That looks sorta like this (again, pseudo code). I'll add a comment to each line to tell you what's going on

Code:
let int index = 1              ;index is how far along on the loop we are. INT means Integer, a number with no decimal point. 
                               ;you can see there is no loop here? We'll fix that in a second
let float x = 2.5              ; X is a floating point number. They take up more memory, and are slower
let float index_f = 0          ;You can't mulitply INTs and FLOATs. So we make a temporary float...
let middle = index_f.address   ;And then we find where the computer has it stored in memory, then the weird part...
let middle = middle + 4        ;we add 4, because that's where the decimal point is on this particular hardware...
copy index to middle           ;So we can copy the value of the index to the left side of the decimal point
let temp = index_f * x         ;okay, the multiply
copy temp to screen.address    ;And copy the result to the part of ram where we keep the screen
index = index +1               ;Now we have to do the loop ourselves. Add one to index...
let end = (index > 10)         ;check if we're at the end of the loop. "end" will be 1 if we are, 0 if not...
goto -8, end                   ;go back 8 instructions (the beginning of the loop) if end is 1

If you didn't follow that, that's okay. The big thing to notice is that simple code generates a lot of instructions. More instructions means slower. Because we repeat the main section over and over again, this takes 91 instructions* for a simple three lines of code.

*it's actually more but I'm trying to keep this simple, and it's already huge

But the compiler can help here. If we make the compiler smart, it can notice some patterns in the code and begin getting that number down. Here is a simple one. Notice how we set X every time we run the loop, but it never changes? The compiler can change that code to this...

Code:
LET X = 2.5   ;Move this up here, out of the loop
LOOP 1 TO 10:
    PRINT X * LOOP

...which changes only the last line in the output...

Code:
goto -7, end      ;go back only 7 instructions now, skip setting x every time

This is called Loop Invariant Code Motion, and in this case it cuts 9 instructions. That's a 10% improvement! But we can do even better. X actually never changes. We can just use its value everywhere and never set it at all - a process called inlining. That saves us another instruction.

What else can we do? Well, most of this dang thing is just converting the loop index from an integer to a float. But we don't actually care if it's an integer in the first place, that's just a side effect of the loop. Can we make it a float the whole time? Yeah!

Code:
let float index = 1.0          ;index is a float now, so use 1.0 instead
let temp = index * 2.5         ;We just inline the 2.5 everywhere we used to have X
move temp to screen.address    ;And move the result to the part of ram where we keep the screen
index = index + 1.0            ;Add 1.0 now, because index is a float
let end = (index > 10.0)       ;check if we're at the end of the loop. "end" will be 1 if we are, 0 if not...
goto -4, end                   ;go back 4 instructions, to the top of the loop

Okay, that's a lot shorter. 51 instructions! That's a 44% performance increase. But we can do better. Do we even need the loop itself? It runs exactly 10 times every time. We could eliminate a lot of work if we just... copied the inside of the loop over and over, in a process called loop unrolling

Code:
index=1
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address   
index = index + 1 
let temp = index * 2.5             
move temp to screen.address

It's longer, but without the loop, it's just 30 instructions. But look at it. Its obvious to you now isn't it. Why add to index every time, and instead, just put the numbers 1 through 10 in their place? And why make the computer do the multiplication when the code runs, when the compiler can do it right now? Let's repeat our inlining tricks from before.

Code:
move 2.5 to screen.address    
move 5 to screen.address    
move 7.5 to screen.address    
move 10 to screen.address    
move 12.5 to screen.address    
move 15 to screen.address    
move 17.5 to screen.address    
move 20 to screen.address    
move 22.5 to screen.address    
move 25 to screen.address

We've taken 3 lines of code, turned it into 91 instructions, and then whittled those instructions back down to 10. The final version takes 1/9th the time of the unoptimized version.

Each of these optimizations is called a pass. You read the whole program in memory, then you pass over the whole thing looking for one kind of optimization, then you alter the code for that optimization, then you pass off to the next optimization. Sometimes you repeat passes, just like we did with inlining values. Each pass takes time. And in order to keep all this working, the compiler has both the original source code and something called an "intermediate representation" which is sort of between the original source and the final code where the compiler can keep all it's notes before generating the final output.

You can see why compiling is so slow! But in the specific case we're talking about - a Switch game running on Drake - we don't need to do that full process. We can't! We don't have the original shader source to work with! We only have the compiled instructions.

What we can do is a transpile. A compile is where we take high level code (the kind a programmer can easily read and write) and turn it into low level code (the kind a processor can run). A transpile is where we take code and turn it into different code of the same level. In this case, we take some low level code, and make new low level code out it.

In our case we're going from low level code to low level code, and we're doing it between two nearly identical architectures and where Drake is much faster than Switch. Because of that, we can do it very very fast. Going deep again, but this one is shorter.

Remember how I said each compiler optimization was a pass?

In a normal compiler, there is the parsing pass. This does zero optimizations, it just reads the original code and turns it into the "intermediate representation", basically, a version of the original code that it can take notes on - IR for short. Then subsequent passes go through that IR to make all their optimizations, then a final run through the updated IR to generate the final code.

But in this case, we already have optimized code. We just need to convert it to a new architecture. And those architectures are almost identical. So we can skip all the optimization passes. And actually, if we're not optimizing, we don't need the intermediate representation, because we're not taking notes. And if we don't have an IR, we don't even need to bother to read all the original code at once, because we're never going to make changes to one part of the code based on a part far far away. We just need to scan for lines that use old instructions, and then replace them with new ones.

For example, imagine that Drake actually removes the move instruction. Move takes some data, removes it from where it used to be, and puts it in a new location. Nvidia decides that's rare enough, that people who want to do that can just copy the data, and delete the old data. Let's look at our final code from before.

Code:
move 2.5 to screen.address    
move 5 to screen.address    
move 7.5 to screen.address    
move 10 to screen.address    
move 12.5 to screen.address    
move 15 to screen.address    
move 17.5 to screen.address    
move 20 to screen.address    
move 22.5 to screen.address    
move 25 to screen.address

We don't actually need to delete the old values, so copy works just fine for us. But we want our transpiler to be fast. So we don't scan the code at all to figure that out, instead we just go line by line, and if we see a move, we replace it with two lines, a copy and a delete

Code:
copy 2.5 to screen.address    
delete NULL ; it's a constant, there is nothing to actually delete
move 5 to screen.address    
delete NULL 
move 7.5 to screen.address  
delete NULL   
move 10 to screen.address    
delete NULL 
move 12.5 to screen.address    
delete NULL 
move 15 to screen.address    
delete NULL 
move 17.5 to screen.address    
delete NULL 
move 20 to screen.address    
delete NULL 
move 22.5 to screen.address    
delete NULL 
move 25 to screen.address    
delete NULL

We can do this extremely fast, because we didn't bother to optimize away the deletes. No IR, no optimization passes, heck, you don't even need to keep the whole program in memory before you start, you can do it line by line, squirting out new instructions while the game is sending you the old ones.

But... all our beautiful optimizations! This new shader takes twice as long to execute as the old shader. But we can't optimize much either. We've got a game running as we speak and it can't do anything till this shader starts executing. If we wait long enough to execute this shader, we create stutter. The longer we spend optimizing this shader, the longer that stutter goes. What to do?

Don't worry about it. Drake is 6x as fast as the original Switch! This is the opposite of a normal situation. Normally you want to spend as long compiling as possible, and make the shader as fast as possible, because you only compile once, but you run the shader over and over again.

But now we're in this weird position where compiling slows the game down, but a slow shader will still run blazingly fast on this new hardware. It's more important to get compiling - or in this case, transpiling - as fast as possible, than it is to make the fastest shader possible.

So, transpiling, much much faster than shader compiling. Is it fast enough? Probably most of the time. And in cases where it isn't on a high profile game, Nintendo can just... have an intern play the game, pull the transpiled shaders out, and push it to you as a patch.
 
Thugstas I’m just gonna go ahead and shoot my shot and if you can’t/don’t want to answer, fair play. Are you personally aware of a 2023 device that is still coming?

Either way, love seeing you still around. You got the speculation ball rolling in many ways way back when at the start of all this on Era!
 
Gotcha. So if I’m understanding correctly, method one could allow for “100%” BC, but not equal performance BC across the board. And then if a game was to have native-like or at least equal to NSW performance, patches would be required. For example, Smash as a fighting game really needs a consistent 60 fps, so a game like that would be more optimized for Drake.
Basically! Except what's really wild, is that it wouldn't be obvious which games might need 2 or 3. Like, Smash is probably super well optimized. It's entirely possible they use a small number of very fast shaders over and over again. Because of that, Smash might run great out of the box.

Meanwhile Barbie's Horse Comber 2000, might use a custom shader for every effect in the game - every firework splash, every glistening pony hair, every twinkle in Barbie(tm)'s vacuous blue eyes could be a different shader. So what seems like a not graphically intense game could stutter like a pop & lock dancer when the record skips.
 
0
Someone can fact check for me, im going by memory but in Reggie's book he talks about visiting NCL once or twice a year, usually to hash out strategy.

We know Bowser visisted Kyoto last fall, so if this is another visist it would be a bit early for a twice per year visit, but potentially not out of the ordinary

Edit: checking his feed very quicky and just scanning for pictures, he last tweeted about beingfin Kyoto in Jan of 2020.


and prior to that October 2019 & July 2019

 
Last edited:
I hope for Bowser’s sake he’s not going back to Japan. That’s a brutal flight to do on a monthly schedule.
I hope for Bowser's sake he holds he liquor.

So, yes games ship with compiled shaders. If you want backwards compatibility on hardware that can’t run those shaders, you have to do something. But in the case of Drake you don't actually need to recompile them. In fact, you can't, because you don't have the original shader source.
You should be able to "decompile" shaders. They're not mysterious blobs, they're sequences of well-know instructions. Functions are inlined, loops unrolled, etc, but it's kinda easy to detect those patterns. The resulting code won't be as useful for a human but a compiler doesn't care. Apple's Rosetta transpiles pretty efficiently between 2 wildly distinct architectures.

Transpiling to Drake should be pretty straight-forward as the architectures are supposedly quite similar. The code is already optimized. It should only take a few instruction patches.

I probably make this seems easier than it really is, but there are dozens of Nvidia and Nintendo employees behind this.
 
I hope for Bowser's sake he holds he liquor.


You should be able to "decompile" shaders. They're not mysterious blobs, they're sequences of well-know instructions. Functions are inlined, loops unrolled, etc, but it's kinda easy to detect those patterns. The resulting code won't be as useful for a human but a compiler doesn't care. Apple's Rosetta transpiles pretty efficiently between 2 wildly distinct architectures.

Transpiling to Drake should be pretty straight-forward as the architectures are supposedly quite similar. The code is already optimized. It should only take a few instruction patches.

I probably make this seems easier than it really is, but there are dozens of Nvidia and Nintendo employees behind this.
Yeah, you could decompile, but you're also right that you don't need to. Transpiling should be completely sufficient for BC.

Actually, it might be fun to just take a bunch of shaders, push them through GLSL and see if we can get them to spit out Ampere and Maxwell microcode and compare them.
 
0
Please read this new, consolidated staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited by a moderator:


Back
Top Bottom