• Hey everyone, staff have documented a list of banned content and subject matter that we feel are not consistent with site values, and don't make sense to host discussion of on Famiboards. This list (and the relevant reasoning per item) is viewable here.

StarTopic Future Nintendo Hardware & Technology Speculation & Discussion |ST| (New Staff Post, Please read)

REAL FLOPS AND FAKE FLOPS STOP COUNTING FLOPS

FLOPS means Floating Point Operations Per Second.

Floating point basically means "fractions." In graphics fractions are common. The other thing about graphics is you're often performing the same operation with that fraction over and over again. See that object in the distance? It's smaller on screen because it's far away, multiply it's left side by 0.5. Now multiply it's right side by 0.5. Now multiply it's top side by 0.5...

Two things about CPUs. They're bad at fractions, and they're bad at performing lots of near identical operations. Enter the GPU. The GPU is good at Graphics, and the core of it is a calculator designed to be good at the math that graphics needs. It can do lots of similar floating point operations at the same time, it instead does them all at once. Because graphics do Lots of Similar operations, there are two ways to make it more powerful. You can make the calculators faster (run at a higher clock speed) or you can add more calculators.

The calculator cores in an Nvidia GPU are organized in to something called SMs. Here is the SM in Turing, the RTX 20 cards.



Right now, I want you to pay attention to the sections labeled "FP32" and "INT32". FP32 stands for "32 bit floating point". When you count FLOPS in a GPU you're counting how many of these FP32 units they have, and multiplying it by however many times these units can execute per second.

The INT32 block stands for "32 bit integer." Graphics love floating point, but they're not exclusively floating point. They need integer operations. That is what this bank is for. Before Turing, both AMD and Nvidia did this basically the same way. A bank of INT and a bank of FP. We talked about FLOPS, but increasing FLOPS also increased INTOPS (integer operation per second) by similar amounts. So we didn't need to talk about them separately.

Then, in RTX 30, Nvidia did this. Let's play spot the difference



There are a few differences here, to be fair, but the one we care about is that there isn't an INT32 block anymore. There is a block labeled "FP32/INT32", beside the "FP32" block that we got before. Nvidia has changed the INT32 block to be able to excute either a floating point operation or an integer operation. FLOPS HAVE DOUBLED

...but only if not integer operations are happening. That INT32/FP32 block can only execute one or the other. Only half the FLOPS are always available. The other half are shared with the INTOPS. If an integer operation is performed, the floating point operations are stick with only the old FP32 bank.

With Ampere, the architecture that is in Switch NG, Nvidia doubled FLOPS, but did not double performance. This was considered a bullshit marketing tactic, and was labeled "fake flops." The thing is, it isn't bullshit. It does increase performance, but not as much as GPU customers had been used to for 20 years.

RTX 30 cards had 2x the flops of RTX 20 cards, but only 1.6x more performance. You cannot use FLOPS alone as a way to compare performance. Just because a system is "newer" doesn't mean that each of it's FLOPS are as good or better than an older machine's FLOPS.


I got up this morning thinking "you know, I should write a post explaining the whole 'doubled FLOPS' Ampere thing", only to find oldpuck got there ahead of me, as per usual...

There is one thing I'd like to add/clarify, though, because I think it's important. Most of the discussion is about how Ampere 'doubled FLOPS' compared to Turing, but it's Turing, not Ampere, that's the odd-one-out.

To clarify, as far as I can tell, Turing is the only consumer GPU architecture ever to have fully independent INT32 ALUs, completely separate from the FP32 ALUs. You're correct that in other architectures when FP32 FLOPS increased, so did INT32 OPS, but from my reading every other architecture could only ever execute either FP32 or INT32 calculations at a time, not both. Here's a quote from Nvidia's Turing whitepaper, on page 11 (emphasis added):

Turing implements a major revamping of the core execution datapaths. Modern shader workloads typically have a mix of FP arithmetic instructions such as FADD or FMAD with simpler instructions such as integer adds for addressing and fetching data, floating point compare or min/max for processing results, etc. In previous shader architectures, the floating-point math datapath sits idle whenever one of these non-FP-math instructions runs. Turing adds a second parallel execution unit next to every CUDA core that executes these instructions in parallel with floating point math.

I've read through as much AMD documentation as I can find, and from what I can tell all of their architectures follow the same approach, with each block of ALUs either able to run INT or FP operations, but not both simultaneously.

Turing was a weird architecture, in that Nvidia realised that there was a decent bit of integer code required alongside floating point, they split the computational hardware in two, dedicating a full 50% of it exclusively to integer operations. So they went from 128 FP/INT "cores" per SM on Maxwell and Pascal to 64 dedicated FP "cores" and 64 dedicated INT "cores" on Turing.

This is where the whole FLOPS comparison really broke down. While FLOPS did, at least theoretically, measure the peak computational throughput of other GPUs, it only measured half of Turing's. Here's a review of the Turing RTX 2080. It's a 10 TFLOP GPU, and it beats the 10 TFLOP Vega 56 by up to 60%! This is a slightly unfair comparison, as Vega 56 was a couple of years old at that point, and RDNA1/2 improved performance a lot for AMD, but it's mostly unfair because the RTX 2080 has an entire extra 10 TOPS of INT32 performance as well that's not accounted for when looking only at TFLOPS figures.

Of course Nvidia wasn't exactly getting great use out of those dedicated INT32 ALUs (which is likely why nobody else ever tried it). From the Turing whitepaper again, Nvidia claim an average of 36 integer operations for every 100 floating point operations. That means they were likely only getting around 36% usage from the INT32 units.

To flip the comparison between the RTX 2080 and Vega 56 another way, let's compare them in terms of combined operations per second for both FP32 and INT32. Vega 56 is still a 10 trillion operations per second GPU by this metric, but RTX 2080, using all its hardware, is a 20 trillion ops per second card. By this measure, Turing underperforms Vega by 20% or more, because those INT32 units are sitting idle most of the time. Turing is often treated as an architecture that was very efficient "per FLOP", but if we include the integer hardware as well, it was actually pretty inefficient overall, even compared to AMD's relatively inefficient Vega architecture.

So, Nvidia quickly recognised that the dedicated INT32 hardware experiment had failed, and switched back to the old combined FP32/INT32 approach with Ampere, this time having one bank of dedicated FP32 hardware and one combined INT/FP bank. This meant that the TFLOPS doubled without performance doubling (as oldpuck described above), but this wasn't because Ampere was using "fake FLOPS" or over-counting, it was because Turing was under-counting. The use of FLOPS to compare GPUs completely broke down with Turing, as it had all this additional integer hardware, but Ampere's changes meant these execution units were counted within FLOPS measures again. If we look at it per operation over all, rather than just per FLOP, Ampere is getting up to 30% more performance per theoretical operation than Turing, because it didn't have integer hardware sitting idle.

If we ignore Turing for a second, let's compare Ampere to the AMD's contemporary RDNA2 architecture. The RTX 3070 and the RX 6800XT are both 20TFLOP GPUs, and the 6800XT outperforms the 3070 by between 10-20%, with the bigger gains at higher resolutions. The 6800XT has double the RAM, more bandwidth and 16x the cache of the 3070, so I suspect that plays a part in the higher res wins. Taking that out, you're probably looking at closer to a 10% win for RDNA2.

This deficit for Ampere doesn't really have anything to do with the "doubled FLOPS", but it's simply that Nvidia spent the previous few generations on weird experiments like dedicated integer hardware (and more useful experiments like tensor cores and hardware RT acceleration), whereas AMD spent the previous few generations completely redesigning their GPU architecture and achieving significant performance gains over their old designs, overtaking Nvidia in the process. Cut Turing out of the picture, and Ampere is part of a relatively straight-forward line of stagnating performance in traditional rendering for Nvidia GPUs while they focus on RT and tensor cores, alongside AMD making significant advances in traditional rendering performance over the same time period.
 
Cut Turing out of the picture, and Ampere is part of a relatively straight-forward line of stagnating performance in traditional rendering for Nvidia GPUs while they focus on RT and tensor cores, alongside AMD making significant advances in traditional rendering performance over the same time period.
gonna be real weird now that AMD is the marketing partner for Ubisoft Massive's two games
 
0
Don't know if Revogamers is legit.

Translated: Tamarindos Freaking Dinner, the next from @CeleryEmblem, responsible for the brilliant Baobabs Mausoleum, will come to Nintendo Switch and also to its successor, as its creator has confirmed to us.


I choose to believe.

With multiple games now announced for the system, we're in a rather hilarious situation, much like this time in 2016, where we know more about games coming to a system than any official details about the system.
 
Don't know if Revogamers is legit.

Translated: Tamarindos Freaking Dinner, the next from @CeleryEmblem, responsible for the brilliant Baobabs Mausoleum, will come to Nintendo Switch and also to its successor, as its creator has confirmed to us.


CeleryWho

Even if above is true, I would not expect them to have devkits at this stage. Sure, some indie companies might already have them, but not to such extent
 
CeleryWho

Even if above is true, I would not expect them to have devkits at this stage. Sure, some indie companies might already have them, but not to such extent
Depends on how you define "Dev kit". We could be at a stage where anyone who's published a game on Switch gets an email from Nvidia like "Don't say anything to anyone but yeah here's the software development kit."

The hardware kit is kind of important but it's far from essential. The SDK is just that - just software anyone with the right PC can run.
 
There is one thing I'd like to add/clarify, though, because I think it's important. Most of the discussion is about how Ampere 'doubled FLOPS' compared to Turing, but it's Turing, not Ampere, that's the odd-one-out.
Oh, now everything makes sense. Well, aside from everything I read about Ampere treating it like cheating when Nvidia was just going back to the standard, but I can see the reaction to "twice the flops but 30% improvements" leading to people feeling they were cheated.

When LiC explained that all Maxwell ALUz were FP32, and INT ops were done there, I got the answer to my question about flop calcs, but it created other questions. Turing being the non-standard one, rather than Ampere, clear them quite nicely.

Thank you all.
 
Even with two GPUs in the same family, even between two games on the same GPU, FLOPS simply don't represent performance.
Nailed it.

This deficit for Ampere doesn't really have anything to do with the "doubled FLOPS", but it's simply that Nvidia spent the previous few generations on weird experiments like dedicated integer hardware (and more useful experiments like tensor cores and hardware RT acceleration), whereas AMD spent the previous few generations completely redesigning their GPU architecture and achieving significant performance gains over their old designs, overtaking Nvidia in the process. Cut Turing out of the picture, and Ampere is part of a relatively straight-forward line of stagnating performance in traditional rendering for Nvidia GPUs while they focus on RT and tensor cores, alongside AMD making significant advances in traditional rendering performance over the same time period.
Great stuff across the board, but worth pulling this out. When people say "Nintendo has moved to Lovelace" I want to shake them and say "why???." If you look at the history of Nvidia architectures, each design changes less and less, till you get to Ampere, which pretty much nailed it. The differences between Ampere and Lovelace could fit into a paragraph.

Don't know if Revogamers is legit.

Translated: Tamarindos Freaking Dinner, the next from @CeleryEmblem, responsible for the brilliant Baobabs Mausoleum, will come to Nintendo Switch and also to its successor, as its creator has confirmed to us.
CeleryWho

Even if above is true, I would not expect them to have devkits at this stage. Sure, some indie companies might already have them, but not to such extent
So Indies have had dev kits or are startibg to receive them?
The above is true, as the developer has confirmed it on Twitter. But "the above" is "I plan on making this game come to whatever Nintendo does next." It doesn't tell us anything about what the dev knows/has. I would bet dollars to donuts that this One Man Studio doesn't have a devkit. I would bet somewhat fewer dollars/donuts on them having received a tech briefing of some kind. I would take 3:2 odds on them just hearing developer chatter.
 
Do we know how early indies were receiving devkits for Switch 1 before the launch?


Other games from the same indie.

Even if you think indies = meh, I love indie games in additional to Nintendo games. I have somewhat of a problem, lol.
Nobody said anything about indies being meh, but even amidst indie companies there must be those who’ll receive access to new hardware faster than the others.

For me, it’s like Skittzo said, it might be their future plan and not something that they are working on right now
 
Much bigger picture of the relevant part of the screen, the hero and objects in the center. Its equivalent to a 9.7 inch 16:9 screen with a little missing left and right. No one looks there anyway. It would make all the copycat devices look tiny, and provide a clear upgrade for Switch owners. The aspect ratio allows for a higher resolution while still keeping GPU strain in check. Or something else.
I don't know, ask them. When the screen size and manufacturer leaked, someone posted a link to a database, and the only 7.91 inch screen by that company I could find at a quick glance was a 1536x2048 4:3 panel. It makes a lot of sense to me and wouldn't surprise me.
Ya I don't think they'd do this at all. Maybe a slightly less wide aspect ratio but never one as extreme as as 4:3. In fact the fact that the rumors of a larger screen are out there (8 in) I'd say that mean it's even more less likely that they'll change the aspect ratio.

When the game "switches" from TV experience to Handheld you don't want to have to program a different UI for the different experiences. When Switch first launched I was worried about this, thinking they would have to.. maybe some games do this but I never noticed. Is the UI blown up on Zelda when hand held? I don't think it is. At any rate an even bigger handheld screen makes this even less of a concern.

But if you change aspect ratio it would basically mandate that whole new UI for the game would need to be created for the different experience. I don't think Nintendo wants that for its platform.
 
Nobody said anything about indies being meh, but even amidst indie companies there must be those who’ll receive access to new hardware faster than the others.

For me, it’s like Skittzo said, it might be their future plan and not something that they are working on right now
It's not necessarily something they have to work on at the moment either, considering Switch 2 is likely BC. Maybe minor work or as Concernt pointed out, they might just have the "soft" devkit (just SDK, no hardware).

Also, I don't think indies normally say "it'll also come to successor" unless they have a reason for saying exactly that.

We wouldn't hear indies say "it's also coming to successor" in 2019, for example..
 
It's not necessarily something they have to work on at the moment either, considering Switch 2 is likely BC. Maybe minor work or as Concernt pointed out, they might just have the "soft" devkit (just SDK, no hardware).

Also, I don't think indies normally say it'll come to "successor" unless they know something.

We wouldn't hear indies say "it's also coming to successor" in 2019.
In 2019? Maybe not
In 2020? Kickstarter for Eiyuden Chronicle mentioned future Nintendo hardware. Did they have devkit? Maybe, for cancelled Switch Pro. Now we will never know
 
Ya I don't think they'd do this at all. Maybe a slightly less wide aspect ratio but never one as extreme as as 4:3. In fact the fact that the rumors of a larger screen are out there (8 in) I'd say that mean it's even more less likely that they'll change the aspect ratio.

When the game "switches" from TV experience to Handheld you don't want to have to program a different UI for the different experiences. When Switch first launched I was worried about this, thinking they would have to.. maybe some games do this but I never noticed. Is the UI blown up on Zelda when hand held? I don't think it is. At any rate an even bigger handheld screen makes this even less of a concern.

But if you change aspect ratio it would basically mandate that whole new UI for the game would need to be created for the different experience. I don't think Nintendo wants that for its platform.
I will note the following two things, though I definitely broadly agree:

Wii games did the "UI switching" thing long ago and I never had any issues with them.

Nintendo Switch games usually target TV mode for their experience, especially in their big, text heavy, touch-unfriendly UI design. Some games have difficult to read or badly rendered text in handheld mode (like Three Houses), or some might have icons that are legible in TV mode but become nearly indiscernible in handheld mode.

UI design will always be a challenge for the concept of a Switch. At present there's two very real extremes you need to be aware of. People playing 2-3 feet away from a 50+inch television in TV mode, and people playing a Nintendo Switch Lite in tabletop mode on the far side of a table.

Now I will admit, these are rare cases.

I will also admit: I am both of these people.
 
0
In 2019? Maybe not
In 2020? Kickstarter for Eiyuden Chronicle mentioned future Nintendo hardware. Did they have devkit? Maybe, for cancelled Switch Pro. Now we will never know
Good point. We'll see few outliers like that one in 2020. We are hearing more and more "coming to successor" in language used by devs lately. Which isn't really surprising in itself anyway. The tweet I shared is another nice reminder of that.
 
0
I take that as a fact and confirmation of FF 7 Remake and Rebirth on Switch 2.

;]
Not a sequel to this?

Dark_Cloud_PS2_Game_cover.jpg
 
"I'm going to being my game to Switch and whatever Nintendo does next".

I think that's just a commitment as opposed to a factual statement right now. I don't think it means much.

"I want to bring it to whatever Nintendo does next, because the world and their dog is talking about it so something is probably happening".

That's more like it.
 
Much bigger picture of the relevant part of the screen, the hero and objects in the center. Its equivalent to a 9.7 inch 16:9 screen with a little missing left and right. No one looks there anyway.
The Entire TV Industry of the Last 20 Years: Uh-oh, we screwed up.
 
While I supposed this time Nintendo is trying to be more secretive about it, I don't think is crazy to believe that some random indie studios could have a devkit early. Remember that on Switch, Vroom in the Night Sky was a launch title and the previous games of the dev were almost all Japan-only small eShop games for 3DS.
 
While I supposed this time Nintendo is trying to be more secretive about it, I don't think is crazy to believe that some random indie studios could have a devkit early. Remember that on Switch, Vroom in the Night Sky was a launch title and the previous games of the dev were almost all Japan-only small eShop games for 3DS.
I
agree, they probably had a meeting with Nintendo at Gamescom to secure a devkit, like we previously heard before, wouldn't be too far fetched at this stage
 
Don't know if Revogamers is legit.

Translated: Tamarindos Freaking Dinner, the next from @CeleryEmblem, responsible for the brilliant Baobabs Mausoleum, will come to Nintendo Switch and also to its successor, as its creator has confirmed to us.


I'm waiting for the day we finally get a game announced like Sonic Forces was back in 2016 for the NX but for a game on the NG.
 
really?my ACD/asperger make me hard to distinguish if this is real or sacarsm
No worries - that's my take. I could be wrong tho.

It's not the first time placeholder says Dec 31.. it has always been like that, but Amazon updates the year, going from one year to next.

The joke: Dec 31 2023 = Launch (or launch window) of Switch 2. :)

Edit: Incidentally, I placed a MP4 preorder long time ago. It's still up there lol.

 
If it is correct that indies have received dev kits, leaks will be coming out imminently.

Although, just because an indie dev says it coming out on a platform, does not concretely mean they have a dev kit.
 
Please read this new, consolidated staff post before posting.

Furthermore, according to this follow-up post, all off-topic chat will be moderated.
Last edited by a moderator:


Back
Top Bottom