Talk:Project Gotham Racing 2
Contents
Reverse engineering notes for file formats
--Zaykho (talk) 10:34, 22 September 2017 (PDT)
Project Gotham Racing 2 use a .PAK container for storing 3D elements, textures and 3D configurations files.
Multiple types of .PAK name can be seen in PGR2, their names are related to their functions and indicate what type of elements are stored:
.PAK for objects ( Cache\Objects & Cache\Cars )
.pak
.PAK for cars ( Cache\Cars )
.pak_cth
.pak_hrd
.pak_opn ( only for roadster : open mode)
.pak_sft ( only for roadster : closed mode for rain)
.PAK for maps ( Game\Areas )
.pak_common
.pak_day
.pak_night
.pak_overcast
.pak_stream (This uses a non-standard PAK format)
To extract those .PAK files, a tool called quickbms and a PGR2 bms script ( both made by Luigi Auriemma ) can be used to get most of the contents stored in the archive.
When extracted, the content stored in the archive is sliced in sections, creating a folder for each of them and joining a file with the actual contents in it, mostly with a .dat suffix.
Here an example for objects:
.PAK for object : red_cone.pak
WMSH \ 00000000.dat
MAT \ 00000001.dat
GPUD \ 00000002.dat
TEXT \ 00000003.nfc
VB \ 00000004.dat
END ( nothing, no folder, no files )
PAK File format
The PAK format is a chunked format. Each chunk starts with:
- u32 chunk-type (can be interpreted as 4 byte ASCII magic)
- u32 unknown
- u32 size of chunk (excluding this field?)
The chunk can contain compressed or uncompressed data. Compression is probably indicated by the upper bits of the size (0xC0000000) being set.
Compression seems to be with a zlib header (0x78, 0xDA
).
Data can be inline, directly following the header. However, in INDX chunks, the data is pointed to by an extra u32 field after each header.
INDX
- u32: number of chunks
- array of chunk headers, each with additional u32 field pointing to the data
- array of chunk data
WMSH
The 00000000.dat file contain the faces indices, and supposedly, the strip and the how the vertices are read ( float, word ? ).
MESH
Same as WMSH?
MAT
Material properties
The 00000001.dat file contain the material properties : diffuse, specular, ambient etc... of each material applied to the 3D file.
SKY
WRAP
INST
Header has some field set to != 0. Seems to contain subchunks which end with "END" chunk?
TIME
RCAM
ANIC
ROUT
RPRM
RTMP
GPUD
GPUS
TEX
LGHT
ACT
INFO
DRVP
AUDI
TVC
END
Marks the end of the file. Can also be inside a file and might mark the end of a subchunk there?
GPUD
GPU Data?
The 00000002.dat file contain all the textures, each of them followed by a small part of data ( texture configuration ? mipmaps ? ). Finally, after all the textures, the vertices and UV position are stored in one chunk of data.
TEXT
Texture information
The 00000003.dat file contain all the text and name related to the texture, it also indicate where each textures are located inside GPUD.
VB
Vertex buffer information
The 00000004.dat file contain the start address offset of the vertices and UV section in the GPUD. If the 3D mesh have multiples groups/sections, the VB file will store each address.
Hacky python script to extract PAK textures
#!/usr/bin/env python3 # Project Gotham Racing 2 (.pak) # Originally a script for QuickBMS http://quickbms.aluigi.org # comtype unzip_dynamic import sys import struct import zlib from PIL import Image # Array entries to export for debug purposes N = 20 def readLong(f): return struct.unpack('<I', f.read(4))[0] def clog(f, NAME, OFFSET, ZSIZE, SIZE): print('Exporting ' + NAME + ' (Compressed) from ' + str(OFFSET)) #FIXME: Export f.seek(OFFSET) compressed = f.read(ZSIZE) #print(compressed) data = bytes() try: decompress = zlib.decompressobj(15) for b in compressed: data += decompress.decompress(compressed) raise except: print("Decompression failed after " + str(len(data)) + ' / ' + str(SIZE) + ' bytes') with open(NAME, 'wb') as e: e.write(data) #print(data) return data def log(f, NAME, OFFSET, ZSIZE): f.seek(OFFSET) data = f.read(ZSIZE) print('Exporting ' + NAME) #FIXME: Export with open(NAME, 'wb') as e: e.write(data) return data textures = [] vbs = [] meshs = [] gpud = bytes([]) def readChunks(f): while True: chunk = readChunk(f) if chunk['type'] == b'END\0': break def readChunk(f, indexed = False): global gpud global textures global vbs global meshs print("At " + str(f.tell())) chunkType = f.read(4) NAME = chunkType.decode('ascii').rstrip('\0') # FIXME: Remove?! print("Found chunk " + NAME) DUMMY = readLong(f) SIZE = readLong(f) chunk = {} chunk['type'] = chunkType chunk['size'] = SIZE if DUMMY: chunk['children'] = [] print("Dummy " + str(DUMMY)) print("Size " + str(SIZE)) if indexed == True: OFFSET = readLong(f) + 0xC print("Offset " + str(OFFSET)) chunk['offset'] = OFFSET else: OFFSET = f.tell() TMP = f.tell() if chunkType == b'INDX': print("Reading index?!") FILES = readLong(f) for i in range(0, FILES): readChunk(f, True) f.read(SIZE - FILES * 16 - 4) return chunk elif chunkType == b'WMSH': pass elif chunkType == b'SKY\0': pass elif chunkType == b'WRAP': pass elif chunkType == b'INST': pass elif chunkType == b'MAT\0': pass elif chunkType == b'TIME': pass elif chunkType == b'RCAM': pass elif chunkType == b'ANIC': pass elif chunkType == b'ROUT': pass elif chunkType == b'RPRM': pass elif chunkType == b'RTMP': # Much like INDX? pass elif chunkType == b'GPUD': pass elif chunkType == b'GPUS': pass elif chunkType == b'TEX\0': pass elif chunkType == b'LGHT': pass elif chunkType == b'ACT\0': pass elif chunkType == b'VB\0\0': pass elif chunkType == b'INFO': pass elif chunkType == b'DRVP': pass elif chunkType == b'TVC\0': pass elif chunkType == b'MESH': pass elif chunkType == b'COLR': pass elif chunkType == b'TEXT': pass elif chunkType == b'AUDI': pass elif chunkType == b'END\0': assert(SIZE == 0) return chunk else: print("Unknown chunk type: " + NAME) print(chunkType) assert(False) f.seek(TMP) if indexed == True: f.seek(OFFSET) # Nested if(DUMMY == 1): chunk['children'] = readChunks(f) f.seek(TMP) return chunk #FIXME: RTMP has DUMMY = 2! if DUMMY == 2: print("Not sure how to handle DUMMY = 2") if SIZE & 0xC0000000: #FIXME: Figure out actual use of these 2 bits ZSIZE = SIZE & 0x3FFFFFFF SIZE = readLong(f) print("c-zSize is " + str(ZSIZE)) print("c-Size is " + str(SIZE)) OFFSET = f.tell() data = clog(f, NAME, OFFSET, ZSIZE - 4, SIZE) else: data = log(f, NAME, OFFSET, SIZE) if chunkType == b'GPUD': gpud = data offset = 87424 for i in range(0, N): if False: x = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 y = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 z = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 color = 0 print("v %f, %f, %f, 0x%08X" % (x, y, z, color)) # The start of the file is closer to this if False: x = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 y = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 z = struct.unpack('<f', data[offset:offset+4])[0] offset += 4 color = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print("v %f, %f, %f, 0x%08X" % (x, y, z, color)) elif chunkType == b'TEXT': offset = 0 count = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print(str(count) + " car texture(s):") for i in range(count): name = data[offset:offset+32] offset += 32 fmt = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 dim = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 dataOffset = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 texture = {} texture['name'] = name.decode('ascii').rstrip('\0') print("0x%04X" % dim) texture['width'] = 1 << ((dim >> 4) & 0xF) texture['height'] = 1 << (dim & 0xF) texture['format'] = fmt texture['offset'] = dataOffset textures += [texture] print(" Texture name: " + texture['name']) #FIXME: There is more data here, at least 4 byte! #count = struct.unpack('<I', data[offset:offset+4])[0] #print(str(count) + " Unknown(s):") elif chunkType == b'TEX\0': offset = 0 count = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print(str(count) + " world texture(s):") for i in range(count): name = data[offset:offset+32] offset += 32 a1 = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 a2 = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 b = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 c = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 d = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 h = 1 << (a2 & 0xF) w = 1 << ((a2 >> 4) & 0xF) print("Size: " + str(w) + "x" + str(h) + " (End: 0x%08X)" % (b + (w * h * 4)//3)) someH = 1 << (d & 0xF) someW = 1 << ((d >> 4) & 0xF) print("Max. Size (?): " + str(someW) + "x" + str(someH)) #FIXME: a? gpudOffset = b #FIXME: c? #FIXME: d? texture = {} texture['name'] = name.decode('ascii').rstrip('\0') texture['offset'] = gpudOffset texture['width'] = w texture['height'] = h texture['format'] = a1 textures += [texture] print(" Texture name: " + texture['name'] + "\n @ 0x%04X 0x%04X 0x%08X 0x%08X 0x%08X " % (a1,a2,b,c,d)) print("\n") #FIXME: There is more data here, at least 4 byte! #count = struct.unpack('<I', data[offset:offset+4])[0] #print(str(count) + " Unknown(s):") elif chunkType == b'VB\0\0': vb = {} for i in range(0, SIZE, 4): vb['offset'] = struct.unpack('<I', data[i:i+4])[0] vbs += [vb] elif chunkType == b'MESH' or chunkType == b'WMSH': # Originally written for MESH, also might work for WMSH offset = 42 count = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print("Index count might be " + str(count)) unk = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print("Unk " + str(unk)) #FIXME: what is this? always zero?! offset += 2 indices = [] for i in range(0, count): #FIXME: Number of indices j = struct.unpack('<H', data[offset:offset+2])[0] indices += [j] offset += 2 if False: #FIXME: Very much WIP.. only developing this for WMSH # (MESH seems to be slightly different) # Align to next 4 byte barrier #offset += 3 #offset &= ~3 batchCount = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 print("Batches " + str(batchCount)) if batchCount > 0: # Format and len until first primitive restart or something? for i in range(0, batchCount): #FIXME: Number of indices per batch in 32 bit?! x = struct.unpack('<I', data[offset:offset+4])[0] print(" A: " + str(x)) offset += 4 for i in range(0, batchCount): #FIXME: Same as before but in 16 bit?! x = struct.unpack('<H', data[offset:offset+2])[0] print(" B: " + str(x)) offset += 2 # We need at least one batch batchCount = max(1, batchCount) for i in range(0, batchCount): x = struct.unpack('<BBBBBB', data[offset:offset+6]) offset += 6 print("data: %02X %02X %02X %02X %02X %02X" % x) # FIXME: How to get here on our own?! assert(offset == (len(data) - batchCount * 2)) totalSize = 0 for i in range(0, batchCount): batchSize = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 totalSize += batchSize print(" Batch size is " + str(batchSize) + " total: " + str(totalSize)) mesh = {} mesh['indices'] = indices meshs += [mesh] elif chunkType == b'AUDI': offset = 0 count = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print(str(count) + " audio sample(s) (???):") for i in range(count): name = data[offset:offset+4] offset += 4 print(" Name: " + str(name)) #FIXME: Read rest of data offset += 12 elif chunkType == b'MAT\0': offset = 0 count = struct.unpack('<I', data[offset:offset+4])[0] offset += 4 print(str(count) + " material(s):") for i in range(count): a = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 b = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 c = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 d = struct.unpack('<H', data[offset:offset+2])[0] offset += 2 print(" Material: 0x%04X, 0x%04X, 0x%04X, 0x%04X" % (a, b, c, d)) #FIXME: Read rest of data if indexed == True: f.seek(TMP) return chunk with open(sys.argv[1], 'rb') as f: while True: tmp = f.tell() try: if f.read(1) == b'': raise except: print("EOF?!") break f.seek(tmp) readChunks(f) with open('test.obj', 'w') as e: for vb in vbs: print("VB: " + str(vb['offset'])) offset = vb['offset'] for i in range(0, 138): print(i) # Helper so we can figure out when the parser hangs x = 0 y = 0 z = 0 u = 0 v = 0 if False: # /tmp/240Z.pak_hrd x = struct.unpack('<h', gpud[offset:offset+2])[0] offset += 2 y = struct.unpack('<h', gpud[offset:offset+2])[0] offset += 2 z = struct.unpack('<h', gpud[offset:offset+2])[0] offset += 2 offset += 4 #FIXME: Fix scale u = struct.unpack('<h', gpud[offset:offset+2])[0] / 512 offset += 2 v = struct.unpack('<h', gpud[offset:offset+2])[0] / 512 offset += 2 if False: # /tmp/sharkfin.pak # Presumably: 17? verts, each 20? bytes # 25 indices # xyz uv x = struct.unpack('<i', gpud[offset:offset+4])[0] offset += 4 y = struct.unpack('<i', gpud[offset:offset+4])[0] offset += 4 z = struct.unpack('<i', gpud[offset:offset+4])[0] offset += 4 #FIXME: Fix scale u = struct.unpack('<i', gpud[offset:offset+4])[0] offset += 4 v = struct.unpack('<i', gpud[offset:offset+4])[0] offset += 4 if True: # /tmp/Red Cone.pak # Presumably: 137 verts, each 24 bytes # 205 indices #FIXME: Broken still! Mostly works but contains garbage data x = struct.unpack('<f', gpud[offset:offset+4])[0] offset += 4 y = struct.unpack('<f', gpud[offset:offset+4])[0] offset += 4 z = struct.unpack('<f', gpud[offset:offset+4])[0] offset += 4 u = struct.unpack('<h', gpud[offset:offset+2])[0] offset += 2 v = struct.unpack('<h', gpud[offset:offset+2])[0] offset += 2 offset += 4 offset += 4 #FIXME: Fix scale u /= 8192 v /= 8192 print("xyz: %f %f %f; uv: %f %f" % (x,y,z, u,v)) e.write('v %f %f %f\n' % (x,y,z)) e.write('vt %f %f\n' % (u,v)) for mesh in meshs: c = -1 b = -1 a = -1 for i in mesh['indices']: c = b b = a a = i + 1 if c >= 0: e.write('f %d/%d %d/%d %d/%d\n' % (c,c,b,b,a,a)) for texture in textures: #with open(str(texture['name']) + ".raw",'wb') as e: # DXT3 = 1 byte per pixel print("Exporting " + texture['name'] + " (%d x %d)" % (texture['width'], texture['height'])) size = texture['width'] * texture['height'] if texture['width'] < 4: print("Texture too small!") # FIXME!!! continue data = gpud[texture['offset']:texture['offset']+size] if texture['format'] == 0x0414: image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3) elif texture['format'] == 0x040C: image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3) elif texture['format'] == 0x041C: image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 3) elif texture['format'] == 0x021C: image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 1) elif texture['format'] == 0x020C: image = Image.frombytes('RGBA', (texture['width'], texture['height']), data, 'bcn', 1) else: print("Unknown format! 0x%04X" % texture['format']) image = None if image: image.save('textures/' + texture['name'] + "-0x%04X" % texture['format'] + ".png") #e.write()