Difference between revisions of "NV2A/Vertex Shader"

From xboxdevwiki
Jump to: navigation, search
(More register info, will be refactored soon)
(nvidia is having issues or taking files down; using archive.org links where possible)
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
The Xbox implements https://www.opengl.org/registry/specs/NV/vertex_program.txt and https://www.opengl.org/registry/specs/NV/vertex_program1_1.txt
+
The Xbox implements the 2 GL extensions [https://www.opengl.org/registry/specs/NV/vertex_program.txt NV_vertex_program] and [https://www.opengl.org/registry/specs/NV/vertex_program1_1.txt NV_vertex_program1_1] (with some modifications).
 
This article will mainly focus on actual encoding on hardware as the behaviour is mostly outlined in those GL extensions already.
 
This article will mainly focus on actual encoding on hardware as the behaviour is mostly outlined in those GL extensions already.
 +
 +
== Operating modes ==
 +
 +
* Fixed / Programmable
 +
* Writeable / Read-Only constants
 +
* Low-constants only / all constants
 +
* Vertex processing / State program
  
 
== Registers ==
 
== Registers ==
  
* 16 input registers v[0] to v[15] (from vertex, v[0] is fed from LAUNCH_DATA for VSPs)
+
=== Input registers ===
* output registers: o[0] to o[?] (initialized to XYZ=0x00000000 W=0x3F800000)
+
 
  * Following indices are aliased: Pos, oD0, oD1, oFog, oPts, oB0, oB1, oT0, oT1, oT2, oT3
+
There are 16 input registers v[0] to v[15].
* 1 Address register: A0.x
+
 
* 12 temporary registers: R0 to R11 (initialized to XYZW=0x00000000)
+
They normally [[NV2A/Vertex attributes|map to the vertex attributes]]. However, in the case of vertex state programs, v[0] is fed from LAUNCH_DATA (PGRAPH Methods 0x1E80, 0x1E84, 0x1E88, 0x1E8C for XYZW respectively) instead.
  * The POS register is mirrored as R12 so it can be used as source operand, so effectively you have 13 temporaries
+
 
 +
=== Output registers ===
 +
 
 +
11 output registers o[<RegName>] (initialized to XYZ=0x00000000 W=0x3F800000).
 +
 
 +
{|class="wikitable"
 +
|+Output registers
 +
|-
 +
!Index
 +
!GL Name
 +
!D3D Name
 +
!Meaning
 +
|-
 +
|0
 +
|HPOS
 +
|oPos
 +
|Homogeneous clip space position
 +
|-
 +
|3
 +
|COL0
 +
|oD0
 +
|Primary color (front-facing)
 +
|-
 +
|4
 +
|COL1
 +
|oD1
 +
|Secondary color (front-facing)
 +
|-
 +
|5
 +
|FOGC
 +
|oFog
 +
|Fog coordinate
 +
|-
 +
|6
 +
|PSIZ
 +
|oPts
 +
|Point size
 +
|-
 +
|7
 +
|BFC0
 +
|oB0
 +
|Back-facing primary color
 +
|-
 +
|8
 +
|BFC1
 +
|oB1
 +
|Back-facing secondary color
 +
|-
 +
|9
 +
|TEX0
 +
|oT0
 +
|Texture coordinate set 0
 +
|-
 +
|10
 +
|TEX1
 +
|oT1
 +
|Texture coordinate set 1
 +
|-
 +
|11
 +
|TEX2
 +
|oT2
 +
|Texture coordinate set 2
 +
|-
 +
|12
 +
|TEX3
 +
|oT3
 +
|Texture coordinate set 3
 +
|}
 +
 
 +
=== Address register ===
 +
 
 +
A0.x exists as documented in the GL extension.
 +
 
 +
=== Temporary registers ===
 +
 
 +
There are 12 temporary registers: R0 to R11 (initialized to XYZW=0x00000000), as documented in the GL extension.
 +
Additionally, o[HPOS] is mirrored as R12 and can be used as source operand; so effectively you have 13 temporaries
 +
 
 +
=== Constant space ===
 +
 
 +
There are 192 constant registers in two seperate blocks with 96 constants each.
 +
They can be accessed through the PGRAPH RDI: select=0x17. Each constant slot is 4x DWORD, ordered as WZYX.
 +
Alternatively they can be uploaded through PGRAPH method [FIXME], with 4x DWORD, ordered XYZW.
 +
 
 +
In nvidia vertex programs only 96 constants are normally accessible. Microsoft exposed the 96 additional constant registers in D3D shaders through c[-96] to c[-1].
 +
This documentation uses the GL terminology instead and expose the new registers as c[96] to c[191]. This means c[0] to c[191] valid.
  
 
== Instructions ==
 
== Instructions ==
  
???
+
In total, there are 136 instruction slots.
 +
 
 +
Each slot consists of 16 bytes, we consider those as 4 seperate little-endian DWORDS describing the operation. Word 0 is inused.
 +
 
 +
{| class="wikitable"
 +
|+Fields
 +
|-
 +
!Meaning
 +
!Word
 +
!Offset (bits)
 +
!Size (bits)
 +
|-
 +
|ILU Operation
 +
|1
 +
|25
 +
|3
 +
|-
 +
|MAC Operation
 +
|1
 +
|21
 +
|4
 +
|-
 +
|Constant index
 +
|1
 +
|13
 +
|8
 +
|-
 +
|Input index
 +
|1
 +
|9
 +
|4
 +
|-
 +
|Source 1 negate
 +
|1
 +
|8
 +
|1
 +
|-
 +
|Source 1 swizzle X
 +
|1
 +
|6
 +
|2
 +
|-
 +
|Source 1 swizzle Y
 +
|1
 +
|4
 +
|2
 +
|-
 +
|Source 1 swizzle Z
 +
|1
 +
|2
 +
|2
 +
|-
 +
|Source 1 swizzle W
 +
|1
 +
|0
 +
|2
 +
|-
 +
|Source 1 register
 +
|2
 +
|28
 +
|4
 +
|-
 +
|Source 1 mux
 +
|2
 +
|26
 +
|2
 +
|-
 +
|Source 2 negate
 +
|2
 +
|25
 +
|1
 +
|-
 +
|Source 2 swizzle X
 +
|2
 +
|23
 +
|2
 +
|-
 +
|Source 2 swizzle Y
 +
|2
 +
|21
 +
|2
 +
|-
 +
|Source 2 swizzle Z
 +
|2
 +
|19
 +
|2
 +
|-
 +
|Source 2 swizzle W
 +
|2
 +
|17
 +
|2
 +
|-
 +
|Source 2 register
 +
|2
 +
|13
 +
|4
 +
|-
 +
|Source 2 mux
 +
|2
 +
|11
 +
|2
 +
|-
 +
|Source 3 negate
 +
|2
 +
|10
 +
|1
 +
|-
 +
|Source 3 swizzle X
 +
|2
 +
|8
 +
|2
 +
|-
 +
|Source 3 swizzle Y
 +
|2
 +
|6
 +
|2
 +
|-
 +
|Source 3 swizzle Z
 +
|2
 +
|4
 +
|2
 +
|-
 +
|Source 3 swizzle W
 +
|2
 +
|2
 +
|2
 +
|-
 +
|Source 3 register (Hi)
 +
|2
 +
|0
 +
|2
 +
|-
 +
|Source 3 register (Lo)
 +
|3
 +
|30
 +
|2
 +
|-
 +
|Source 3 mux
 +
|3
 +
|28
 +
|2
 +
|-
 +
|Destination MAC mask
 +
|3
 +
|24
 +
|4
 +
|-
 +
|Destination temporary register
 +
|3
 +
|20
 +
|4
 +
|-
 +
|Destination ILU mask
 +
|3
 +
|16
 +
|4
 +
|-
 +
|Destination overall mask
 +
|3
 +
|12
 +
|4
 +
|-
 +
|Destination select
 +
|3
 +
|11
 +
|1
 +
|-
 +
|Destination output register
 +
|3
 +
|3
 +
|8
 +
|-
 +
|Destination mux
 +
|3
 +
|2
 +
|1
 +
|-
 +
|Relative constant addressing
 +
|3
 +
|1
 +
|1
 +
|-
 +
|Final instruction marker (EOF)
 +
|3
 +
|0
 +
|1
 +
|}
 +
 
 +
{| class="wikitable"
 +
|+Swizzle table
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|X
 +
|-
 +
|1
 +
|Y
 +
|-
 +
|2
 +
|Z
 +
|-
 +
|3
 +
|W
 +
|}
 +
 
 +
=== Functional units ===
 +
 
 +
 
 +
==== Inverse Logic Unit (ILU) ====
 +
 
 +
{| class="wikitable"
 +
|+ILU Operations
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|NOP
 +
|-
 +
|1
 +
|MOV
 +
|-
 +
|2
 +
|RCP
 +
|-
 +
|3
 +
|RCC
 +
|-
 +
|4
 +
|RSQ
 +
|-
 +
|5
 +
|EXP
 +
|-
 +
|6
 +
|LOG
 +
|-
 +
|7
 +
|LIT
 +
|}
 +
 
 +
==== Multiply-Accumulate (MAC) ====
 +
 
 +
{| class="wikitable"
 +
|+MAC Operations
 +
|-
 +
!Value
 +
!Meaning
 +
|-
 +
|0
 +
|NOP
 +
|-
 +
|1
 +
|MOV
 +
|-
 +
|2
 +
|MUL
 +
|-
 +
|3
 +
|ADD
 +
|-
 +
|4
 +
|MAD
 +
|-
 +
|5
 +
|DP3
 +
|-
 +
|6
 +
|DPH
 +
|-
 +
|7
 +
|DP4
 +
|-
 +
|8
 +
|DST
 +
|-
 +
|9
 +
|MIN
 +
|-
 +
|10
 +
|MAX
 +
|-
 +
|11
 +
|SLT
 +
|-
 +
|12
 +
|SGE
 +
|-
 +
|13
 +
|ARL
 +
|}
 +
 
 +
== Related links ==
 +
 
 +
* nvidia resources
 +
** [https://web.archive.org/web/20191214200729/https://www.nvidia.com/attach/6559 WhereIsThatVertexShaderInstruction.pdf] / [https://web.archive.org/web/20191214200738/https://www.nvidia.com/attach/6560 WhereIsThatVertexShaderInstruction.doc]
 +
* [https://github.com/envytools/envytools/blob/master/nvhw/pgraph_celsius_xfrm.c Code which appears to implement bit-accurate emulation of some instructions]{{FIXME|reason=Unconfirmed, needs testing}}
 +
 
 +
[[Category:NV2A]]

Latest revision as of 23:42, 4 May 2020

The Xbox implements the 2 GL extensions NV_vertex_program and NV_vertex_program1_1 (with some modifications). This article will mainly focus on actual encoding on hardware as the behaviour is mostly outlined in those GL extensions already.

Operating modes

  • Fixed / Programmable
  • Writeable / Read-Only constants
  • Low-constants only / all constants
  • Vertex processing / State program

Registers

Input registers

There are 16 input registers v[0] to v[15].

They normally map to the vertex attributes. However, in the case of vertex state programs, v[0] is fed from LAUNCH_DATA (PGRAPH Methods 0x1E80, 0x1E84, 0x1E88, 0x1E8C for XYZW respectively) instead.

Output registers

11 output registers o[<RegName>] (initialized to XYZ=0x00000000 W=0x3F800000).

Output registers
Index GL Name D3D Name Meaning
0 HPOS oPos Homogeneous clip space position
3 COL0 oD0 Primary color (front-facing)
4 COL1 oD1 Secondary color (front-facing)
5 FOGC oFog Fog coordinate
6 PSIZ oPts Point size
7 BFC0 oB0 Back-facing primary color
8 BFC1 oB1 Back-facing secondary color
9 TEX0 oT0 Texture coordinate set 0
10 TEX1 oT1 Texture coordinate set 1
11 TEX2 oT2 Texture coordinate set 2
12 TEX3 oT3 Texture coordinate set 3

Address register

A0.x exists as documented in the GL extension.

Temporary registers

There are 12 temporary registers: R0 to R11 (initialized to XYZW=0x00000000), as documented in the GL extension. Additionally, o[HPOS] is mirrored as R12 and can be used as source operand; so effectively you have 13 temporaries

Constant space

There are 192 constant registers in two seperate blocks with 96 constants each. They can be accessed through the PGRAPH RDI: select=0x17. Each constant slot is 4x DWORD, ordered as WZYX. Alternatively they can be uploaded through PGRAPH method [FIXME], with 4x DWORD, ordered XYZW.

In nvidia vertex programs only 96 constants are normally accessible. Microsoft exposed the 96 additional constant registers in D3D shaders through c[-96] to c[-1]. This documentation uses the GL terminology instead and expose the new registers as c[96] to c[191]. This means c[0] to c[191] valid.

Instructions

In total, there are 136 instruction slots.

Each slot consists of 16 bytes, we consider those as 4 seperate little-endian DWORDS describing the operation. Word 0 is inused.

Fields
Meaning Word Offset (bits) Size (bits)
ILU Operation 1 25 3
MAC Operation 1 21 4
Constant index 1 13 8
Input index 1 9 4
Source 1 negate 1 8 1
Source 1 swizzle X 1 6 2
Source 1 swizzle Y 1 4 2
Source 1 swizzle Z 1 2 2
Source 1 swizzle W 1 0 2
Source 1 register 2 28 4
Source 1 mux 2 26 2
Source 2 negate 2 25 1
Source 2 swizzle X 2 23 2
Source 2 swizzle Y 2 21 2
Source 2 swizzle Z 2 19 2
Source 2 swizzle W 2 17 2
Source 2 register 2 13 4
Source 2 mux 2 11 2
Source 3 negate 2 10 1
Source 3 swizzle X 2 8 2
Source 3 swizzle Y 2 6 2
Source 3 swizzle Z 2 4 2
Source 3 swizzle W 2 2 2
Source 3 register (Hi) 2 0 2
Source 3 register (Lo) 3 30 2
Source 3 mux 3 28 2
Destination MAC mask 3 24 4
Destination temporary register 3 20 4
Destination ILU mask 3 16 4
Destination overall mask 3 12 4
Destination select 3 11 1
Destination output register 3 3 8
Destination mux 3 2 1
Relative constant addressing 3 1 1
Final instruction marker (EOF) 3 0 1
Swizzle table
Value Meaning
0 X
1 Y
2 Z
3 W

Functional units

Inverse Logic Unit (ILU)

ILU Operations
Value Meaning
0 NOP
1 MOV
2 RCP
3 RCC
4 RSQ
5 EXP
6 LOG
7 LIT

Multiply-Accumulate (MAC)

MAC Operations
Value Meaning
0 NOP
1 MOV
2 MUL
3 ADD
4 MAD
5 DP3
6 DPH
7 DP4
8 DST
9 MIN
10 MAX
11 SLT
12 SGE
13 ARL

Related links