Difference between revisions of "NV2A/Vertex Shader"

From xboxdevwiki
Jump to: navigation, search
(Created page with "== Registers == There are: * 12 temporary registers: R0 to R11 * output registers: oPos, oD0, oD1, oFog, oPts, oB0, oB1, oT0, oT1, oT2, oT3, A0.x * Input data: ??? R12 is a...")
 
(nvidia is having issues or taking files down; using archive.org links where possible)
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
The Xbox implements the 2 GL extensions [https://www.opengl.org/registry/specs/NV/vertex_program.txt NV_vertex_program] and [https://www.opengl.org/registry/specs/NV/vertex_program1_1.txt NV_vertex_program1_1] (with some modifications).
 +
This article will mainly focus on actual encoding on hardware as the behaviour is mostly outlined in those GL extensions already.
 +
 +
== Operating modes ==
 +
 +
* Fixed / Programmable
 +
* Writeable / Read-Only constants
 +
* Low-constants only / all constants
 +
* Vertex processing / State program
 +
 
== Registers ==
 
== Registers ==
  
There are:
+
=== Input registers ===
  
* 12 temporary registers: R0 to R11
+
There are 16 input registers v[0] to v[15].
* output registers: oPos, oD0, oD1, oFog, oPts, oB0, oB1, oT0, oT1, oT2, oT3, A0.x
 
* Input data: ???
 
  
R12 is a mirror of the output register oPos
+
They normally [[NV2A/Vertex attributes|map to the vertex attributes]]. However, in the case of vertex state programs, v[0] is fed from LAUNCH_DATA (PGRAPH Methods 0x1E80, 0x1E84, 0x1E88, 0x1E8C for XYZW respectively) instead.
 +
 
 +
=== Output registers ===
 +
 
 +
11 output registers o[<RegName>] (initialized to XYZ=0x00000000 W=0x3F800000).
 +
 
 +
{|class="wikitable"
 +
|+Output registers
 +
|-
 +
!Index
 +
!GL Name
 +
!D3D Name
 +
!Meaning
 +
|-
 +
|0
 +
|HPOS
 +
|oPos
 +
|Homogeneous clip space position
 +
|-
 +
|3
 +
|COL0
 +
|oD0
 +
|Primary color (front-facing)
 +
|-
 +
|4
 +
|COL1
 +
|oD1
 +
|Secondary color (front-facing)
 +
|-
 +
|5
 +
|FOGC
 +
|oFog
 +
|Fog coordinate
 +
|-
 +
|6
 +
|PSIZ
 +
|oPts
 +
|Point size
 +
|-
 +
|7
 +
|BFC0
 +
|oB0
 +
|Back-facing primary color
 +
|-
 +
|8
 +
|BFC1
 +
|oB1
 +
|Back-facing secondary color
 +
|-
 +
|9
 +
|TEX0
 +
|oT0
 +
|Texture coordinate set 0
 +
|-
 +
|10
 +
|TEX1
 +
|oT1
 +
|Texture coordinate set 1
 +
|-
 +
|11
 +
|TEX2
 +
|oT2
 +
|Texture coordinate set 2
 +
|-
 +
|12
 +
|TEX3
 +
|oT3
 +
|Texture coordinate set 3
 +
|}
 +
 
 +
=== Address register ===
 +
 
 +
A0.x exists as documented in the GL extension.
 +
 
 +
=== Temporary registers ===
 +
 
 +
There are 12 temporary registers: R0 to R11 (initialized to XYZW=0x00000000), as documented in the GL extension.
 +
Additionally, o[HPOS] is mirrored as R12 and can be used as source operand; so effectively you have 13 temporaries
 +
 
 +
=== Constant space ===
 +
 
 +
There are 192 constant registers in two seperate blocks with 96 constants each.
 +
They can be accessed through the PGRAPH RDI: select=0x17. Each constant slot is 4x DWORD, ordered as WZYX.
 +
Alternatively they can be uploaded through PGRAPH method [FIXME], with 4x DWORD, ordered XYZW.
 +
 
 +
In nvidia vertex programs only 96 constants are normally accessible. Microsoft exposed the 96 additional constant registers in D3D shaders through c[-96] to c[-1].
 +
This documentation uses the GL terminology instead and expose the new registers as c[96] to c[191]. This means c[0] to c[191] valid.
  
 
== Instructions ==
 
== Instructions ==
  
???
+
In total, there are 136 instruction slots.
 +
 
 +
Each slot consists of 16 bytes, we consider those as 4 seperate little-endian DWORDS describing the operation. Word 0 is inused.
 +
 
 +
{| class="wikitable"
 +
|+Fields
 +
|-
 +
!Meaning
 +
!Word
 +
!Offset (bits)
 +
!Size (bits)
 +
|-
 +
|ILU Operation
 +
|1
 +
|25
 +
|3
 +
|-
 +
|MAC Operation
 +
|1
 +
|21
 +
|4
 +
|-
 +
|Constant index
 +
|1
 +
|13
 +
|8
 +
|-
 +
|Input index
 +
|1
 +
|9
 +
|4
 +
|-
 +
|Source 1 negate
 +
|1
 +
|8
 +
|1
 +
|-
 +
|Source 1 swizzle X
 +
|1
 +
|6
 +
|2
 +
|-
 +
|Source 1 swizzle Y
 +
|1
 +
|4
 +
|2
 +
|-
 +
|Source 1 swizzle Z
 +
|1
 +
|2
 +
|2
 +
|-
 +
|Source 1 swizzle W
 +
|1
 +
|0
 +
|2
 +
|-
 +
|Source 1 register
 +
|2
 +
|28
 +
|4
 +
|-
 +
|Source 1 mux
 +
|2
 +
|26
 +
|2
 +
|-
 +
|Source 2 negate
 +
|2
 +
|25
 +
|1
 +
|-
 +
|Source 2 swizzle X
 +
|2
 +
|23
 +
|2
 +
|-
 +
|Source 2 swizzle Y
 +
|2
 +
|21
 +
|2
 +
|-
 +
|Source 2 swizzle Z
 +
|2
 +
|19
 +
|2
 +
|-
 +
|Source 2 swizzle W
 +
|2
 +
|17
 +
|2
 +
|-
 +
|Source 2 register
 +
|2
 +
|13
 +
|4
 +
|-
 +
|Source 2 mux
 +
|2
 +
|11
 +
|2
 +
|-
 +
|Source 3 negate
 +
|2
 +
|10
 +
|1
 +
|-
 +
|Source 3 swizzle X
 +
|2
 +
|8
 +
|2
 +
|-
 +
|Source 3 swizzle Y
 +
|2
 +
|6
 +
|2
 +
|-
 +
|Source 3 swizzle Z
 +
|2
 +
|4
 +
|2
 +
|-
 +
|Source 3 swizzle W
 +
|2
 +
|2
 +
|2
 +
|-
 +
|Source 3 register (Hi)
 +
|2
 +
|0
 +
|2
 +
|-
 +
|Source 3 register (Lo)
 +
|3
 +
|30
 +
|2
 +
|-
 +
|Source 3 mux
 +
|3
 +
|28
 +
|2
 +
|-
 +
|Destination MAC mask
 +
|3
 +
|24
 +
|4
 +
|-
 +
|Destination temporary register
 +
|3
 +
|20
 +
|4
 +
|-
 +
|Destination ILU mask
 +
|3
 +
|16
 +
|4
 +
|-
 +
|Destination overall mask
 +
|3
 +
|12
 +
|4
 +
|-
 +
|Destination select
 +
|3
 +
|11
 +
|1
 +
|-
 +
|Destination output register
 +
|3
 +
|3
 +
|8
 +
|-
 +
|Destination mux
 +
|3
 +
|2
 +
|1
 +
|-
 +
|Relative constant addressing
 +
|3
 +
|1
 +
|1
 +
|-
 +
|Final instruction marker (EOF)
 +
|3
 +
|0
 +
|1
 +
|}
 +
 
 +
{| class="wikitable"
 +
|+Swizzle table
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|X
 +
|-
 +
|1
 +
|Y
 +
|-
 +
|2
 +
|Z
 +
|-
 +
|3
 +
|W
 +
|}
 +
 
 +
=== Functional units ===
 +
 
 +
 
 +
==== Inverse Logic Unit (ILU) ====
 +
 
 +
{| class="wikitable"
 +
|+ILU Operations
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|NOP
 +
|-
 +
|1
 +
|MOV
 +
|-
 +
|2
 +
|RCP
 +
|-
 +
|3
 +
|RCC
 +
|-
 +
|4
 +
|RSQ
 +
|-
 +
|5
 +
|EXP
 +
|-
 +
|6
 +
|LOG
 +
|-
 +
|7
 +
|LIT
 +
|}
 +
 
 +
==== Multiply-Accumulate (MAC) ====
 +
 
 +
{| class="wikitable"
 +
|+MAC Operations
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|NOP
 +
|-
 +
|1
 +
|MOV
 +
|-
 +
|2
 +
|MUL
 +
|-
 +
|3
 +
|ADD
 +
|-
 +
|4
 +
|MAD
 +
|-
 +
|5
 +
|DP3
 +
|-
 +
|6
 +
|DPH
 +
|-
 +
|7
 +
|DP4
 +
|-
 +
|8
 +
|DST
 +
|-
 +
|9
 +
|MIN
 +
|-
 +
|10
 +
|MAX
 +
|-
 +
|11
 +
|SLT
 +
|-
 +
|12
 +
|SGE
 +
|-
 +
|13
 +
|ARL
 +
|}
 +
 
 +
== Related links ==
 +
 
 +
* nvidia resources
 +
** [https://web.archive.org/web/20191214200729/https://www.nvidia.com/attach/6559 WhereIsThatVertexShaderInstruction.pdf] / [https://web.archive.org/web/20191214200738/https://www.nvidia.com/attach/6560 WhereIsThatVertexShaderInstruction.doc]
 +
* [https://github.com/envytools/envytools/blob/master/nvhw/pgraph_celsius_xfrm.c Code which appears to implement bit-accurate emulation of some instructions]{{FIXME|reason=Unconfirmed, needs testing}}
 +
 
 +
[[Category:NV2A]]

Latest revision as of 23:42, 4 May 2020

The Xbox implements the 2 GL extensions NV_vertex_program and NV_vertex_program1_1 (with some modifications). This article will mainly focus on actual encoding on hardware as the behaviour is mostly outlined in those GL extensions already.

Operating modes

  • Fixed / Programmable
  • Writeable / Read-Only constants
  • Low-constants only / all constants
  • Vertex processing / State program

Registers

Input registers

There are 16 input registers v[0] to v[15].

They normally map to the vertex attributes. However, in the case of vertex state programs, v[0] is fed from LAUNCH_DATA (PGRAPH Methods 0x1E80, 0x1E84, 0x1E88, 0x1E8C for XYZW respectively) instead.

Output registers

11 output registers o[<RegName>] (initialized to XYZ=0x00000000 W=0x3F800000).

Output registers
Index GL Name D3D Name Meaning
0 HPOS oPos Homogeneous clip space position
3 COL0 oD0 Primary color (front-facing)
4 COL1 oD1 Secondary color (front-facing)
5 FOGC oFog Fog coordinate
6 PSIZ oPts Point size
7 BFC0 oB0 Back-facing primary color
8 BFC1 oB1 Back-facing secondary color
9 TEX0 oT0 Texture coordinate set 0
10 TEX1 oT1 Texture coordinate set 1
11 TEX2 oT2 Texture coordinate set 2
12 TEX3 oT3 Texture coordinate set 3

Address register

A0.x exists as documented in the GL extension.

Temporary registers

There are 12 temporary registers: R0 to R11 (initialized to XYZW=0x00000000), as documented in the GL extension. Additionally, o[HPOS] is mirrored as R12 and can be used as source operand; so effectively you have 13 temporaries

Constant space

There are 192 constant registers in two seperate blocks with 96 constants each. They can be accessed through the PGRAPH RDI: select=0x17. Each constant slot is 4x DWORD, ordered as WZYX. Alternatively they can be uploaded through PGRAPH method [FIXME], with 4x DWORD, ordered XYZW.

In nvidia vertex programs only 96 constants are normally accessible. Microsoft exposed the 96 additional constant registers in D3D shaders through c[-96] to c[-1]. This documentation uses the GL terminology instead and expose the new registers as c[96] to c[191]. This means c[0] to c[191] valid.

Instructions

In total, there are 136 instruction slots.

Each slot consists of 16 bytes, we consider those as 4 seperate little-endian DWORDS describing the operation. Word 0 is inused.

Fields
Meaning Word Offset (bits) Size (bits)
ILU Operation 1 25 3
MAC Operation 1 21 4
Constant index 1 13 8
Input index 1 9 4
Source 1 negate 1 8 1
Source 1 swizzle X 1 6 2
Source 1 swizzle Y 1 4 2
Source 1 swizzle Z 1 2 2
Source 1 swizzle W 1 0 2
Source 1 register 2 28 4
Source 1 mux 2 26 2
Source 2 negate 2 25 1
Source 2 swizzle X 2 23 2
Source 2 swizzle Y 2 21 2
Source 2 swizzle Z 2 19 2
Source 2 swizzle W 2 17 2
Source 2 register 2 13 4
Source 2 mux 2 11 2
Source 3 negate 2 10 1
Source 3 swizzle X 2 8 2
Source 3 swizzle Y 2 6 2
Source 3 swizzle Z 2 4 2
Source 3 swizzle W 2 2 2
Source 3 register (Hi) 2 0 2
Source 3 register (Lo) 3 30 2
Source 3 mux 3 28 2
Destination MAC mask 3 24 4
Destination temporary register 3 20 4
Destination ILU mask 3 16 4
Destination overall mask 3 12 4
Destination select 3 11 1
Destination output register 3 3 8
Destination mux 3 2 1
Relative constant addressing 3 1 1
Final instruction marker (EOF) 3 0 1
Swizzle table
Value Meaning
0 X
1 Y
2 Z
3 W

Functional units

Inverse Logic Unit (ILU)

ILU Operations
Value Meaning
0 NOP
1 MOV
2 RCP
3 RCC
4 RSQ
5 EXP
6 LOG
7 LIT

Multiply-Accumulate (MAC)

MAC Operations
Value Meaning
0 NOP
1 MOV
2 MUL
3 ADD
4 MAD
5 DP3
6 DPH
7 DP4
8 DST
9 MIN
10 MAX
11 SLT
12 SGE
13 ARL

Related links