Jump to content

Starfield's .cdb Material Database


CerebralPolicy

Recommended Posts

I've been working on some retextures for my own pseudo-faction and now that SF1Edit has been released as a beta, I'd like to make them as non-replacements. Now in previous games this would have been easy, even if Material Editor hasn't been updated a .bgsm file is simply a .json config. I could edit it in Notepad++ and boom done.

 

One tiny issue.

 

ukzg43d.png

 

What black magic is this Bethesda? I can't open this. I know for a fact that the materials files are in this database since the nifs refer to a .mat file. I can generate a .mat file using blender, but it would be easier if I could crack open this database. Hopefully someone will make a tool that would allow this, since I can't know what the _mask textures actually do until I can see Bethesda's .mats.

 

UPDATE: Blender can't really generate a .mat and I have no idea how the .mats are formatted to begin with.

Edited by CerebralPolicy
Link to comment
Share on other sites

Yeah, from opening it with a hex editor it is some sort of hashmap for all of these materials. If you open it in a hex or text editor, it is all stored in plaintext to you are able to ctrl+f for the relevant .dds filenames you need to have I guess but knowing the proper format for the .mat files would be nice... for a standalone item though yeah, good luck, I guess you could try and find some unused or test mat and highjack it's material references?

Link to comment
Share on other sites

  • 1 month later...

Just ran into this myself.  Navigated from the ESM file that has a record that points to a NIF file that resides in a BA2 file.  But the NIF file has a reference to a *.mat file which I assume must be in this CDB file.  I think this may be a Constant Database File (see https://docs.fileformat.com/database/cdb/ ).  Sadly the original CDB reader doesn't compile on my CentOS system (I think the code is too old and relies on out of date libraries) and the TinyCDB compiles, but apparently barks on the materials.cdb file.  Still not sure how to map the filename in the NIF file to a key in this (supposed) CDB KV store - but heck I still cannot even find a way to read the CDB file.

Did Bethesda have some sort of "find the most obtuse file formats known to man" contest when designing their games?  Why BA2 files when we have tar and zip files already?  Why CDB files that somehow hold *.MAT files (whatever the heck they are) instead of just packing the MAT files into a BA2 file?

Anyway, going to go read the specs for the CDB format and see if I can cobble together a Java program that might read the file.  Assuming it even is a Constant DB file...

Edited by grizbane
Link to comment
Share on other sites

Hm.  Nope.  Whatever that "materials.cdb" file is, its not a "Constant Database" file.  Found information on the CDB format here: https://www.unixuser.org/~euske/doc/cdbinternals/index.html The first 8 bytes are 4-byte binary integers followed by a bunch of 4-byte offsets into the file.  This file starts with "BETH" and then lots of ASCII characters.  So no idea what type of file we've got here.

Link to comment
Share on other sites

It is unrelated to other file formats with a ".cdb" extension, the format is actually the same as that of the REFL subrecords in Starfield.esm. These are used to serialize various data structures, and are not limited to materials. It is possible to dump the information using the mat_info tool from here, or with esmview or esmdump in the case of the ESM.

Link to comment
Share on other sites

For reference, here is a short description of the reflection data format. It consists of a set of chunks that begin with the chunk type (4 bytes, BETH, STRT, TYPE, CLAS, OBJT, DIFF, LIST, MAPC, USER or USRD) followed by the data size as a 32-bit integer, then the chunk data. The first three chunks are always BETH, STRT and TYPE, in this order:

  • BETH: Header, the size is always 8 bytes, and the data consists of two 32-bit integers, a version number that is currently always 4, and the total number of chunks in the stream (this includes the BETH itself).
  • STRT: String table, it is a set of C style NUL terminated strings concatenated to a single data block. In the rest of the stream, type and variable names are 32-bit signed integer offsets into the string table. There is also a set of pre-defined types that can be referenced with negative string table offsets (see below).
  • TYPE: It contains a single 32-bit integer that is the number of CLAS chunks to follow.

These are followed by class definitions (CLAS), the same number as specified in TYPE. The format of CLAS data is:

  • Class name as string table offset.
  • Class version/ID as a 32-bit integer, typically 0, 1 or 2.
  • Flags as 16-bit integer, if bit 2 (0x0004) is set, then a USER or USRD chunk will be used to store the structure data. Bit 3 (0x0008) is set on certain structures, but its exact purpose is unknown. Other flag bits seem to be currently unused.
  • Number of field definitions to follow (16-bit integer).
  • The definition of a single field consists of the name and type (both as 32-bit string table offsets), then the data offset and size as 16-bit integers. The latter two refer to how the structure data is stored in memory in the game (with alignment etc. taken into account), and are not required for decoding.

The class definitions are followed by the actual data. Each object is stored as an OBJT or DIFF chunk, which begin with the data type (as string table offset). The difference between OBJT and DIFF is that the former just contains all data fields as defined in the CLAS, while the latter uses a "differential" format that allows for encoding only a subset of the fields. In DIFF, each field is stored as a 16-bit signed field number (0 = first) followed by the data. A negative field number denotes the end of the structure. The differential format is not used within simple built-in types (like integers and floats) that are not structures, but it is inherited by sub-structures that are stored in separate LIST, MAPC, USER or USRD chunks.

Regular structures can be nested within OBJT and DIFF, however, certain data types require additional chunks, which are stored separately after the parent object. These special types include:

  • LIST: A list of objects, begins with the element type (string table offset) and the number of elements (32-bit integer), followed by the element data.
  • MAPC: A map of objects, similar to LIST, but it contains key, value pairs, and it begins with the key and value types (two string table offsets) and the number of pairs.
  • USER, USRD: Used for sub-structures with the "user" (0x0004) flag set, USER if the parent is OBJT, and USRD if it is DIFF. These allow for type conversions, and always begin with the class name (similarly to OBJT and DIFF) and end with an unknown 32-bit integer that seems to be always a small non-negative value, typically 0, 1 or 2. After the class name, the data is stored as type, value pairs. If the type is the same as the class name (no actual conversion is performed), then the encoding of the data is identical to OBJT and DIFF. Otherwise, a type, value pair is stored for each field of the structure, and the type seems to be always a built-in one (negative string table offset) based on the data I could test in the ESM and CDB. Note that in this case, there seems to be no difference between USER and USRD, and a value can be assigned even to an empty structure (0 fields).

Finally, here is the table of built-in types:

  • 0xFFFFFF01 (-255): Null, no data.
  • 0xFFFFFF02 (-254): String, a 16-bit length value followed by the C style string data (including a terminating NUL character).
  • 0xFFFFFF03 (-253): List, requires a separate LIST chunk.
  • 0xFFFFFF04 (-252): Map, requires a separate MAPC chunk.
  • 0xFFFFFF05 (-251): Pointer/reference to anything, stored as a pair of type and data (the type is a string table offset).
  • 0xFFFFFF08 (-248): 8-bit signed integer.
  • 0xFFFFFF09 (-247): 8-bit unsigned integer.
  • 0xFFFFFF0A (-246): 16-bit signed integer.
  • 0xFFFFFF0B (-245): 16-bit unsigned integer.
  • 0xFFFFFF0C (-244): 32-bit signed integer.
  • 0xFFFFFF0D (-243): 32-bit unsigned integer.
  • 0xFFFFFF0E (-242): 64-bit signed integer.
  • 0xFFFFFF0F (-241): 64-bit unsigned integer.
  • 0xFFFFFF10 (-240): Boolean (0 or 1 as an 8-bit integer).
  • 0xFFFFFF11 (-239): 32-bit float.
  • 0xFFFFFF12 (-238): 64-bit float.

The above is only a description of the general reflection data. However, the material database can be dumped in a human readable format with mat_info -dump_db, which could be of help understanding the structures it uses. The 32-bit hashes in resource IDs use CRC32, this sample code correctly calculates them (paths are expected to use lower case characters only, and backslashes as separators).

Edited by fo76utils
Link to comment
Share on other sites

More on the material database, it consists of two objects describing the list of materials, material objects and components, followed by all components.

The first object in the CDB is of type BSMaterial::Internal::CompiledDB:

BSMaterial::Internal::CompiledDB {
  String  BuildVersion
  Map  HashMap
  List  Collisions
  List  Circular
}

BuildVersion is the version of the game (currently "1.8.86.0"), and HashMap is a map from BSResource::ID to uint64_t. It maps material paths (represented as CRC32 hashes of the base name without the .mat extension and the directory name, and the extension that is always "mat\0") to an unknown 64-bit hash. Note that while the definition of BSResource::ID has the fields in Dir, File, Ext order, File is actually the first in the data.

The second object is BSComponentDB2::DBFileIndex:

BSComponentDB2::DBFileIndex {
  Map  ComponentTypes
  List  Objects
  List  Components
  List  Edges
  Bool  Optimized
}

'ComponentTypes' maps 16-bit component type IDs to string format class names. 'Objects' is a list of all material objects, in this format:

BSComponentDB2::DBFileIndex::ObjectInfo {
  BSResource::ID  PersistentID
  BSComponentDB2::ID  DBID
  BSComponentDB2::ID  Parent
  Bool  HasData
}

PersistentID is similar to the keys used in HashMap above, and contains the same information for the externally visible layered material (.mat) objects. DBID is the internal 32-bit ID of the object (it cannot be 0), while Parent is used as the base object to construct this object from. HasData is true for all except 6 "root" objects from which all others are derived, and only for those, the Parent is 0. These 6 objects are for the 6 material object types, denoted by the base names "layeredmaterials", "blenders", "layers", "materials", "texturesets" and "uvstreams".

'Components' links material components to material objects:

BSComponentDB2::DBFileIndex::ComponentInfo {
  BSComponentDB2::ID  ObjectID
  UInt16  Index
  UInt16  Type
}

ObjectID is one of the DBID values from the object list, Index is a component slot for component types of which there can be more than one for a single object (e.g. a TextureSet object may have texture files associated with it at Index = 0 to 20), otherwise it is 0, and Type is one of the 16-bit component type IDs previously defined in ComponentTypes.

Finally, 'Edges' describes how the material objects are organized in a tree structure:

BSComponentDB2::DBFileIndex::EdgeInfo {
  BSComponentDB2::ID  SourceID
  BSComponentDB2::ID  TargetID
  UInt16  Index
  UInt16  Type
}

Index seems to be always 0, and Type is always the ID of BSComponentDB2::OuterEdge. This defines TargetID as logically the parent of SourceID.

After BSMaterial::Internal::CompiledDB and BSComponentDB2::DBFileIndex, all material components are stored as OBJT and DIFF chunks, the total number and order of these is exactly the same as in 'Components' above. All components of derived objects are always stored after all components of their base object (the Parent in ObjectInfo), so they can be copy constructed using the data that has already been read.

Edited by fo76utils
Link to comment
Share on other sites

From the description above, it does sound like a "sg-cdb" file, and makes sense to use it as it is basically a fast read DB file.  But quite possible Bethesda made some tweaks or modifications where its not straightfoward to open programatically like a standard .cdb file.  The materials file is called materialsbeta.cdb after all (note the "beta" part).

 

fwiw I used the "sg-cdb" java (what I'm most familiar with) program to try to open the materisbeta.cdb, and it throws an IllegalArgumentException "invalid cdb format", similar to trying to open up a completely different and random file like "word.exe"  or "my-dog-pics.jpeg" (works fine on a sample .cdb file I created).

 

Didn't want to spend too much time digging around the source code of sg-cdb (as this could very well be Bethesda black magic), but this exception is thrown in a nextElement method in class CDB which tries to parse out a data entry - not surprising this method will choke if the .cdb is not formatted as expected (or is a completely different file type).

Edited by captsensib1e
Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...