Talk:Project Gotham Racing 2

From xboxdevwiki
Jump to: navigation, search

Reverse engineering notes for file formats

--Zaykho (talk) 10:34, 22 September 2017 (PDT)


Project Gotham Racing 2 use a .PAK container for storing 3D elements, textures and 3D configurations files.

Multiple types of .PAK name can be seen in PGR2, their names are related to their functions and indicate what type of elements are stored:


.PAK for objects ( Cache\Objects & Cache\Cars )


.pak


.PAK for cars ( Cache\Cars )


.pak_cth

.pak_hrd

.pak_opn ( only for roadster : open mode)

.pak_sft ( only for roadster : closed mode for rain)


.PAK for maps ( Game\Areas )


.pak_common

.pak_day

.pak_night

.pak_overcast

.pak_stream (This uses a non-standard PAK format)


To extract those .PAK files, a tool called quickbms and a PGR2 bms script ( both made by Luigi Auriemma ) can be used to get most of the contents stored in the archive.

When extracted, the content stored in the archive is sliced in sections, creating a folder for each of them and joining a file with the actual contents in it, mostly with a .dat suffix.

Here an example for objects:


.PAK for object : red_cone.pak


WMSH \ 00000000.dat

MAT \ 00000001.dat

GPUD \ 00000002.dat

TEXT \ 00000003.nfc

VB \ 00000004.dat

END ( nothing, no folder, no files )


PAK File format

The PAK format is a chunked format. Each chunk starts with:

  • u32 chunk-type (can be interpreted as 4 byte ASCII magic)
  • u32 unknown
  • u32 size of chunk (excluding this field?)

The chunk can contain compressed or uncompressed data. Compression is probably indicated by the upper bits of the size (0xC0000000) being set. Compression seems to be with a zlib header (0x78, 0xDA).

Data can be inline, directly following the header. However, in INDX chunks, the data is pointed to by an extra u32 field after each header.


INDX

  • u32: number of chunks
  • array of chunk headers, each with additional u32 field pointing to the data
  • array of chunk data

WMSH

The 00000000.dat file contain the faces indices, and supposedly, the strip and the how the vertices are read ( float, word ? ).

MESH

Same as WMSH?

MAT

Material properties

The 00000001.dat file contain the material properties : diffuse, specular, ambient etc... of each material applied to the 3D file.

SKY

WRAP

INST

Header has some field set to != 0. Seems to contain subchunks which end with "END" chunk?

TIME

RCAM

ANIC

ROUT

RPRM

RTMP

GPUD

GPUS

TEX

LGHT

ACT

INFO

DRVP

AUDI

TVC

END

Marks the end of the file. Can also be inside a file and might mark the end of a subchunk there?

GPUD

GPU Data?

The 00000002.dat file contain all the textures, each of them followed by a small part of data ( texture configuration ? mipmaps ? ). Finally, after all the textures, the vertices and UV position are stored in one chunk of data.

TEXT

Texture information

The 00000003.dat file contain all the text and name related to the texture, it also indicate where each textures are located inside GPUD.

VB

Vertex buffer information

The 00000004.dat file contain the start address offset of the vertices and UV section in the GPUD. If the 3D mesh have multiples groups/sections, the VB file will store each address.

Hacky python script to extract PAK textures

#!/usr/bin/env python3

# Project Gotham Racing 2 (.pak)
# Originally a script for QuickBMS http://quickbms.aluigi.org

# comtype unzip_dynamic

import sys
import struct
import zlib
from PIL import Image

# Array entries to export for debug purposes
N = 20

def readLong(f):
  return struct.unpack('<I', f.read(4))[0]

def clog(f, NAME, OFFSET, ZSIZE, SIZE):
  print('Exporting ' + NAME + ' (Compressed) from ' + str(OFFSET)) #FIXME: Export
  f.seek(OFFSET)
  compressed = f.read(ZSIZE)
  #print(compressed)
  data = bytes()
  try:
    decompress = zlib.decompressobj(15)
    for b in compressed:
      data += decompress.decompress(compressed)
    raise
  except:
    print("Decompression failed after " + str(len(data)) + ' / ' + str(SIZE) + ' bytes')
    with open(NAME, 'wb') as e:
      e.write(data)
    #print(data)
  return data

def log(f, NAME, OFFSET, ZSIZE):
  f.seek(OFFSET)
  data = f.read(ZSIZE)
  print('Exporting ' + NAME) #FIXME: Export
  with open(NAME, 'wb') as e:
    e.write(data)
  return data

textures = []
vbs = []
meshs = []
gpud = bytes([])
  
def readChunks(f):
  while True:
    chunk = readChunk(f)
    if chunk['type'] == b'END\0':
      break

def readChunk(f, indexed = False):  
  global gpud
  global textures 
  global vbs
  global meshs
  
  print("At " + str(f.tell()))

  chunkType = f.read(4)
  NAME = chunkType.decode('ascii').rstrip('\0') # FIXME: Remove?!
  print("Found chunk " + NAME)

  DUMMY = readLong(f)
  SIZE = readLong(f)

  chunk = {}
  chunk['type'] = chunkType
  chunk['size'] = SIZE
  if DUMMY:
    chunk['children'] = []

  print("Dummy " + str(DUMMY))
  print("Size " + str(SIZE))

  if indexed == True:
    OFFSET = readLong(f) + 0xC
    print("Offset " + str(OFFSET))
    chunk['offset'] = OFFSET
  else:
    OFFSET = f.tell()

  TMP = f.tell()

  if chunkType == b'INDX':
    print("Reading index?!")
    FILES = readLong(f)

    for i in range(0, FILES):
      readChunk(f, True)
    
    f.read(SIZE - FILES * 16 - 4)

    return chunk

  elif chunkType == b'WMSH':
    pass
  elif chunkType == b'SKY\0':
    pass
  elif chunkType == b'WRAP':
    pass
  elif chunkType == b'INST':
    pass
  elif chunkType == b'MAT\0':
    pass
  elif chunkType == b'TIME':
    pass
  elif chunkType == b'RCAM':
    pass
  elif chunkType == b'ANIC':
    pass
  elif chunkType == b'ROUT':
    pass
  elif chunkType == b'RPRM':
    pass
  elif chunkType == b'RTMP':
    # Much like INDX?
    pass
  elif chunkType == b'GPUD':
    pass
  elif chunkType == b'GPUS':
    pass
  elif chunkType == b'TEX\0':
    pass
  elif chunkType == b'LGHT':
    pass
  elif chunkType == b'ACT\0':
    pass
  elif chunkType == b'VB\0\0':
    pass
  elif chunkType == b'INFO':
    pass
  elif chunkType == b'DRVP':
    pass
  elif chunkType == b'TVC\0':
    pass
  elif chunkType == b'MESH':
    pass
  elif chunkType == b'COLR':
    pass
  elif chunkType == b'TEXT':
    pass
  elif chunkType == b'AUDI':
    pass
  elif chunkType == b'END\0':
    assert(SIZE == 0)
    return chunk

  else:
    print("Unknown chunk type: " + NAME)
    print(chunkType)
    assert(False)

  f.seek(TMP)

  if indexed == True:
    f.seek(OFFSET)

  # Nested
  if(DUMMY == 1):
    chunk['children'] = readChunks(f)
    f.seek(TMP)
    return chunk

  #FIXME: RTMP has DUMMY = 2!
  if DUMMY == 2:
    print("Not sure how to handle DUMMY = 2")  

  if SIZE & 0xC0000000: #FIXME: Figure out actual use of these 2 bits
    ZSIZE = SIZE & 0x3FFFFFFF
    SIZE = readLong(f)
    print("c-zSize is " + str(ZSIZE))
    print("c-Size is " + str(SIZE))
    OFFSET = f.tell()
    data = clog(f, NAME, OFFSET, ZSIZE - 4, SIZE)
  else:
    data = log(f, NAME, OFFSET, SIZE)

  if chunkType == b'GPUD':

    gpud = data

    offset = 87424
    for i in range(0, N):
      if False:
        x = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        y = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        z = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        color = 0
        print("v %f, %f, %f, 0x%08X" % (x, y, z, color))

      # The start of the file is closer to this
      if False:
        x = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        y = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        z = struct.unpack('<f', data[offset:offset+4])[0]
        offset += 4
        color = struct.unpack('<I', data[offset:offset+4])[0]
        offset += 4
        print("v %f, %f, %f, 0x%08X" % (x, y, z, color))
  elif chunkType == b'TEXT':
    offset = 0
    count = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print(str(count) + " car texture(s):")
    for i in range(count):
      name = data[offset:offset+32]
      offset += 32
      fmt = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      dim = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      dataOffset = struct.unpack('<I', data[offset:offset+4])[0]
      offset += 4
      texture = {}
      texture['name'] = name.decode('ascii').rstrip('\0')
      print("0x%04X" % dim)
      texture['width'] = 1 << ((dim >> 4) & 0xF)
      texture['height'] = 1 << (dim & 0xF)
      texture['format'] = fmt
      texture['offset'] = dataOffset
      textures += [texture]
      print("  Texture name: " + texture['name'])

    #FIXME: There is more data here, at least 4 byte!
    #count = struct.unpack('<I', data[offset:offset+4])[0]
    #print(str(count) + " Unknown(s):")
  elif chunkType == b'TEX\0':
    offset = 0
    count = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print(str(count) + " world texture(s):")
    for i in range(count):
      name = data[offset:offset+32]
      offset += 32
      a1 = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      a2 = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      b = struct.unpack('<I', data[offset:offset+4])[0]
      offset += 4
      c = struct.unpack('<I', data[offset:offset+4])[0]
      offset += 4
      d = struct.unpack('<I', data[offset:offset+4])[0]
      offset += 4

      h = 1 << (a2 & 0xF)
      w = 1 << ((a2 >> 4) & 0xF)
      print("Size: " + str(w) + "x" + str(h) + " (End: 0x%08X)" % (b + (w * h * 4)//3))

      someH = 1 << (d & 0xF)
      someW = 1 << ((d >> 4) & 0xF)
      print("Max. Size (?): " + str(someW) + "x" + str(someH))

      #FIXME: a?
      gpudOffset = b
      #FIXME: c?
      #FIXME: d?

      texture = {}
      texture['name'] = name.decode('ascii').rstrip('\0')
      texture['offset'] = gpudOffset
      texture['width'] = w
      texture['height'] = h
      texture['format'] = a1
      textures += [texture]

      print("  Texture name: " + texture['name'] + "\n @ 0x%04X 0x%04X 0x%08X 0x%08X 0x%08X " % (a1,a2,b,c,d))


      print("\n")
    #FIXME: There is more data here, at least 4 byte!
    #count = struct.unpack('<I', data[offset:offset+4])[0]
    #print(str(count) + " Unknown(s):")
  elif chunkType == b'VB\0\0':
    vb = {}
    for i in range(0, SIZE, 4):
      vb['offset'] = struct.unpack('<I', data[i:i+4])[0]
    vbs += [vb]
  elif chunkType == b'MESH' or chunkType == b'WMSH':
    # Originally written for MESH, also might work for WMSH

    offset = 42
    count = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print("Index count might be " + str(count))

    unk = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print("Unk " + str(unk))

    #FIXME: what is this? always zero?!
    offset += 2

    indices = []
    for i in range(0, count): #FIXME: Number of indices
      j = struct.unpack('<H', data[offset:offset+2])[0]
      indices += [j]
      offset += 2

    if False:
      #FIXME: Very much WIP.. only developing this for WMSH
      # (MESH seems to be slightly different)

      # Align to next 4 byte barrier
      #offset += 3
      #offset &= ~3
      
      batchCount = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      print("Batches " + str(batchCount))

      if batchCount > 0:

        # Format and len until first primitive restart or something?
        for i in range(0, batchCount):
          #FIXME: Number of indices per batch in 32 bit?!
          x = struct.unpack('<I', data[offset:offset+4])[0]
          print("  A: " + str(x))
          offset += 4
        for i in range(0, batchCount):
          #FIXME: Same as before but in 16 bit?!
          x = struct.unpack('<H', data[offset:offset+2])[0]
          print("  B: " + str(x))
          offset += 2

      # We need at least one batch
      batchCount = max(1, batchCount)

      for i in range(0, batchCount):
        x = struct.unpack('<BBBBBB', data[offset:offset+6])
        offset += 6
        print("data: %02X %02X %02X %02X %02X %02X" % x)

      # FIXME: How to get here on our own?!
      assert(offset == (len(data) - batchCount * 2))
      totalSize = 0
      for i in range(0, batchCount):
        batchSize = struct.unpack('<H', data[offset:offset+2])[0]
        offset += 2
        totalSize += batchSize
        print("  Batch size is " + str(batchSize) + " total: " + str(totalSize))

    mesh = {}
    mesh['indices'] = indices
    meshs += [mesh]
  elif chunkType == b'AUDI':
    offset = 0
    count = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print(str(count) + " audio sample(s) (???):")
    for i in range(count):
      name = data[offset:offset+4]
      offset += 4
      print("  Name: " + str(name))
      #FIXME: Read rest of data
      offset += 12
  elif chunkType == b'MAT\0':
    offset = 0
    count = struct.unpack('<I', data[offset:offset+4])[0]
    offset += 4
    print(str(count) + " material(s):")
    for i in range(count):
      a = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      b = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      c = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      d = struct.unpack('<H', data[offset:offset+2])[0]
      offset += 2
      print("  Material: 0x%04X, 0x%04X, 0x%04X, 0x%04X" % (a, b, c, d))
    #FIXME: Read rest of data

  if indexed == True:
    f.seek(TMP)
  
  return chunk

with open(sys.argv[1], 'rb') as f:
  while True:
    tmp = f.tell()
    try:
      if f.read(1) == b'':
        raise
    except:
      print("EOF?!")
      break
    f.seek(tmp)
    readChunks(f)

  with open('test.obj', 'w') as e:
    for vb in vbs:
      print("VB: " + str(vb['offset']))
      offset = vb['offset']
      for i in range(0, 138):

        print(i) # Helper so we can figure out when the parser hangs

        x = 0
        y = 0
        z = 0
        u = 0
        v = 0

        if False: # /tmp/240Z.pak_hrd
          x = struct.unpack('<h', gpud[offset:offset+2])[0]
          offset += 2
          y = struct.unpack('<h', gpud[offset:offset+2])[0]
          offset += 2
          z = struct.unpack('<h', gpud[offset:offset+2])[0]
          offset += 2

          offset += 4

          #FIXME: Fix scale
          u = struct.unpack('<h', gpud[offset:offset+2])[0] / 512
          offset += 2
          v = struct.unpack('<h', gpud[offset:offset+2])[0] / 512
          offset += 2

        if False: # /tmp/sharkfin.pak

          # Presumably: 17? verts, each 20? bytes
          # 25 indices

          # xyz uv

          x = struct.unpack('<i', gpud[offset:offset+4])[0]
          offset += 4
          y = struct.unpack('<i', gpud[offset:offset+4])[0]
          offset += 4
          z = struct.unpack('<i', gpud[offset:offset+4])[0]
          offset += 4

          #FIXME: Fix scale
          u = struct.unpack('<i', gpud[offset:offset+4])[0]
          offset += 4
          v = struct.unpack('<i', gpud[offset:offset+4])[0]
          offset += 4

        if True: # /tmp/Red Cone.pak

          # Presumably: 137 verts, each 24 bytes
          # 205 indices

          #FIXME: Broken still! Mostly works but contains garbage data

          x = struct.unpack('<f', gpud[offset:offset+4])[0]
          offset += 4
          y = struct.unpack('<f', gpud[offset:offset+4])[0]
          offset += 4
          z = struct.unpack('<f', gpud[offset:offset+4])[0]
          offset += 4

          u = struct.unpack('<h', gpud[offset:offset+2])[0]
          offset += 2
          v = struct.unpack('<h', gpud[offset:offset+2])[0]
          offset += 2

          offset += 4
          offset += 4

          #FIXME: Fix scale
          u /= 8192
          v /= 8192

        print("xyz: %f %f %f; uv: %f %f" % (x,y,z, u,v))
        e.write('v %f %f %f\n' % (x,y,z))
        e.write('vt %f %f\n' % (u,v))

    for mesh in meshs:
      c = -1
      b = -1
      a = -1
      for i in mesh['indices']:
        c = b
        b = a
        a = i + 1
        if c >= 0:
          e.write('f %d/%d %d/%d %d/%d\n' % (c,c,b,b,a,a))
  
  for texture in textures:
    #with open(str(texture['name']) + ".raw",'wb') as e:
      # DXT3 = 1 byte per pixel
      print("Exporting " + texture['name'] + " (%d x %d)" % (texture['width'], texture['height']))
      size = texture['width'] * texture['height']
      if texture['width'] < 4:
        print("Texture too small!") # FIXME!!!
        continue
      data = gpud[texture['offset']:texture['offset']+size]
      if texture['format'] == 0x0414:
        image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3)
      elif texture['format'] == 0x040C:
        image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3)
      elif texture['format'] == 0x041C:
        image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3)
      elif texture['format'] == 0x021C:
        image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 1)
      elif texture['format'] == 0x020C:
        image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 1)
      else:
        print("Unknown format! 0x%04X" %  texture['format'])
        image = None
      if image:
        image.save('textures/' + texture['name'] + "-0x%04X" % texture['format'] + ".png")
      #e.write()

--JayFoxRox (talk) 09:38, 23 September 2017 (PDT)