Jump to content

Photo

R&D - New Modding Tools


  • Please log in to reply
209 replies to this topic

#1
Amineri

Amineri

    Resident poster

  • Premium Member
  • 3,927 posts

I've been rooting around in the upk files, and am finally ready to present some things and request some help. The focus on my most recent exploration has been on how variable/function references are handled in the upks.

 

My apologies in advance for the long post, but this really is the simplified version. I have a lot more details that I can provide as necessary.

 

----------------------------------

 

I think that at this point I know enough to create the algorithm that would allow for a complete extraction of all references used in a upk. This is already done by UE Explorer (else it couldn't decompile it), but my aim here is to generate tools to do at least the following two things :

 

1) Generate a complete 1-to-1 string-to-reference mapping for a upk. 

 

With such a mapping made before a patch and a mapping made after a patch, mods could be much more easily updated to work with a new patch.

 

Having access to this would also allow for further tools such as semi-compilers that could turn unrealscript into unrealhex for insertion into a particular upk file.

 

2) Allow reconstruction of the objectlist within a upk 

 

Such a reconstruction would allow for (at the least) adding additional filebytes to functions, allowing insertion of code larger than the vanilla function.

 

---------------------------------------------------------------------------------

 

Here are my initial findings:

 

At the beginning of the upk is some fixed header information:

0) Parse upk header:
  • bytes 19-1C : number of namelist entries (0x8268 for XComGame and 0x35A3 for XComStrategyGame)
  • bytes 1D-20 : start of namelist table (0x2B9 for XComGame, 0x81 for XComStrategyGame)
  • bytes 21-24 : number of objectlist entries (0xDBA4 for XComGame 0x57AC for XComStrategyGame) 
  • bytes 25-28 : start of objectlist table (0x12054 for XComGame, 0x708DC for XComStrategyGame)

 

This provides file positions for the two key lists : the namelist and the objectlist.
 
Each namelist entry is variable size. Each function/variable namestring is only defined once in the upk, and it is defined here in the namelist.
 
Each namelist entry has the following format:

1) Parse namelist table into array -- in UE Explorer this is the Tables/Names tab
 
Namelist entry is variable length:

4 bytes, length of string
string in ASCII
00
00 00 00 00 10 00 07 00

Virtual functions (beginning with 0x1B token) use the index of the function name in the namelist as their references.

 
 
Each objectlist entry is a fixed size (17 words = 68 bytes). Pretty much everything in the upk has an object entry, including:
  • Classes
  • Functions
  • Enum containers
  • Enum entries
  • Struct containers
  • Struct entries
  • Local Variables
  • Class Variables
  • Function Parameters
  • Function Return Values
  • and more...

 

Each objectlist entry has the following format:

word 0 - 0 for class objects, negative value defining type for variables and functions -- functions have value -387 = 7D FE FF FF
word 1 - 0 for variables, reference to parent class for classes
word 2 - reference (objectlist index) to owner -- 0 if no owner
word 3 - index into namelist table from step 1 -- name of variable
words 4, 5 - appear to always be 0
word 6 - property flags
word 7 - 04 00 07 00 
word 8 - file size of object referenced to in word 10
word 9 - file pointer to function in upk (location of header/script for functions) -- for variable points to variable buffer 
word 10 -- appears to always be 0 for script objects
word 11 - 0 for script objects, number of additional words (beyond 16) for others
words 12-17 - appear to always be 0 for script objects -- used for art asset content
words ## - additional words depending on value of word 11

Every reference (except virtual functions) used in the hex is the index of the object in the object list. This includes final functions (that use the 0x1C token), local variable (using the 0x0 token), class variables (using the 0x01 token) and structs (using the 0x35 token).

 

-----------------------------

 

The first step is to parse through the upk and create (very large) arrays holding the namelist and objectlist.

 

After this is done I think the following pseudo-code will create 1-to-1 mappings between names and references:

 

1) For every object except virtual function references:

foreach objectlist(currObj)
{
    if(currObj.type == -387)  // object is a function
    {
        Mapping.type = finalFunction; // reference can only be used with 1C final function token
    }
    else
    {
        Mapping.type = regular; // reference is used for anything except 1B and 1C tokens
    }
    Mapping.Reference = currObj.index;  // index of object
    Mapping.Name = getNamelistString(currObj);  // namelist index read from word 4 
    OwnerObj = getOwnerObject(currObj); // owner objectlist reference read from word 3 
    I = 0;
    while(OwnerObj != 0)
    {
        Mapping.OwnerName[I] = getNamelistString(OwnerObj);
        OwnerObj = getOwnerObj(OwnerObj);
        I ++;
    }
}

2) For virtual function references :

foreach objectlist(currObj)
{
    if(currObj.type == -387) // object is a function
    {
        Mapping.type = virtualFunction; // reference can only be used with 1B virtual function token
    }
    Mapping.Reference = currObj.stringIndex;   // word 3 in object
    Mapping.Name = getNamelistString(currObj);
    OwnerObj = getOwnerObject(currObj); // owner objectlist reference read from word 3 
    I = 0;
    while(OwnerObj != 0)
    {
        Mapping.OwnerName[I] = getNamelistString(OwnerObj);
        OwnerObj = getOwnerObj(OwnerObj);
        I ++;
    }
}

-----------------

 

This extracts the owner names for each object and uses them to create a unique 1-to-1 mapping.

 

Common variable names such as "I" have only one namelist entry, but multiple entries in the objectlist that reference that same namelist entry. This creates a many-to-1 mapping which makes it difficult to invert.

 

With the above pseudocode instead of just "I" something like "XGUnit.GetWeaponInfos.I" would be matched to the reference.

 

With the above 1-to-1 mapping between namestrings and references, this same operation could be performed on two different versions of a game upk. For example, EW (release) could be mapped as well as EW (patch 1).

 

With these two mapping a correlation between EW (release) and EW (patch 1) references could be created, which would allow for a mostly-automated updating of mods when patches are released

 

----------------

 

At this point I need to ask for help from someone that is a more serious programmer than me. Believe it or not, that's not in my skillset :smile:.

 

The XComGame.upk contains well over 50,000 objectlist entries, so the code to extract and compare these things is going to have to be tight and robust.

 

As an alternative to coding this directly, UE Explorer has the Eliot.UELib.dll which should allow access to UE Explorer's upk parsing routines, but I don't know how to use that either.


Edited by Amineri, 21 November 2013 - 06:02 AM.


#2
TheOldOne822

TheOldOne822

    Enthusiast

  • Premium Member
  • 133 posts

You need someone better at programing than I am (I do have training in GW and Q basic, Flash AS and Visual basic but that's it) but if this tool ever does get finished could the "reconstruction" be used by me to add larger flash files or CaesarInvictus to add his textures to the main game without texmod?

 

could it be used to add new art assets IE for extra ranks?



#3
wghost81

wghost81

    Wasteland Ghost

  • Supporter
  • PipPipPipPipPip
  • 7,313 posts
Sounds very interesting. My first thought after new EU patch was to extract new indexes for variables/functions directly from upk and simply replace old indexes with new ones, since nothing else has changed. But I couldn't find upk file format anywhere on the Internet (because it's proprietary, I guess). With this information I may try to do this. Thanks, Amineri!

I'm not very good programmer myself and I probably can't make a program(s) you want, but I'll try this for my own mod and share a code... if it will ever work. :smile:

#4
bokauk

bokauk

    Fan

  • Members
  • PipPipPip
  • 483 posts

Great work as usual Amineri, but we've come to expect nothing less from you :smile:

It is quite possible to map an object's name to its offset. A few weeks ago, I began re-writing how Custom Mods work in ToolBoks which would allow forward compatibility and lessen the burden of mod authors updating their mods after each update.

Below is a revamped implementation of a Custom Mod that I was using to test this:
 

NAME=Disable Friendly Fire (forward-compatible)
AUTHOR=bokauk
DESCRIPTION=New ToolBoks Custom Mod example to demonstrate forward-compatible mods by using variable names.

Compatible with XCOM Enemy Unknown versions:
 - All versions?


LOCATION=XComGame\CookedPCConsole\XComGame.upk>XGUnit.AddTargetsInSight
OFFSET=153
INSTALL_HEX=
{
	We want to modify the following code
		from:	arrVisibleTargetsToUse.Length == 
		to:		arrVisibleTargetsToUse.Length <
		
	In this instance, this could be achieved just by changing the byte for the symbol, but for the purposes of demonstrating how this works, I'll include the variable name as well (the bytecode for symbols (==, <, etc) do not change between versions).
	
		
	Original bytecode looks like:
	9A 36 00 80 B3 00 00
	
	Broken down:
	9A 		36 			00 						80 B3 00 00
	==		.Length		LocalVariableToken		arrVisibleTargetsToUse
	
	The bytes for variables change when the UPK file is updated after a new version of the game has been released.
	
	Instead of hard-coding the bytes for the variable, we can type the name of the variable inside of parentheses.
	
	ToolBoks will then automatically convert the variable name into the correct bytes for the version of the game that the user has.
	
	Hex	Symbol	Description
	9A	==		Friendly fire enabled
	96	<		Friendly fire disabled
}
96

{ .Length, LocalVariableToken }
36 00

(arrVisibleTargetsToUse) { Variable names inside of parentheses will be converted into hex by ToolBoks }

{
	Without comments, it looks like:
	96 36 00 (arrVisibleTargetsToUse)
	
	For this particular version of the game, ToolBoks will convert this to:
	9A 36 00 80 B3 00 00
	
	In a later version of the game, ToolBoks might convert it to something like:
	9A 36 00 79 B9 00 00
	
	
	This will make the mod forward compatible, as long as that specific function is not updated.
}

UNINSTALL_HEX=

The "LOCATION" parameter would now allow you to specify where to modify by using the File>Class.Function notation (sometimes File>Class.State.Function), which would be the same between game versions.

It would also be possible to modify the EXE ( LOCATION=Binaries\Win32\XComGame.exe ) to disable hashcheck, (read from loose DefaultGameCore.ini?) and also modify INI and INT files, allowing for all game files to be modified.

With the above CM, I got as far as returning the correct function position in the UPK from parsing the LOCATION, but have not yet worked on the variables inside of parenthesis, so still a bit of work to do.

As you mentioned, the next step up from this would be to write a compiler so you would only need to type:

arrVisibleTargetsToUse.Length ==

instead of:

96 36 00 (arrVisibleTargetsToUse)

Although, both would be future compatible, which I think is the key issue for now.

Regarding point 2 in your post referring to resizing functions; it is my understanding that this would shift the entire contents of the file after that point. This would mean that most offsets (including function jumps) would have to be corrected. I've toyed with the idea of looking into it more, but for me, this is the current holy grail of modding XCom and probably too ambitious for me to take on at the moment. Please correct me if I have misunderstood the shifted offsets issue, or you know a way whereby this wouldn't be a problem :smile:

 

EDIT: Silly me, jump offsets are relative, so that part of it wouldn't be an issue :)


Edited by bokauk, 20 November 2013 - 11:14 AM.


#5
Drakous79

Drakous79

    Drak

  • Supporter
  • PipPipPip
  • 843 posts

Great ideas girls :) Such tool could be very helpful.

 

Using 1-to-1 string-to-reference mapping to update names sounds doable. Sorry to say so I won't be of much help, because this will need more than HTML/CSS.



#6
Amineri

Amineri

    Resident poster

  • Premium Member
  • 3,927 posts

You need someone better at programing than I am (I do have training in GW and Q basic, Flash AS and Visual basic but that's it) but if this tool ever does get finished could the "reconstruction" be used by me to add larger flash files or CaesarInvictus to add his textures to the main game without texmod?

 

could it be used to add new art assets IE for extra ranks?

 

Flash files are stored in different upks, and I'm not sure if they are embedded inside of regular unreal-type objects or what, but with a bit more work that might be possible.

 

All textures for the entire game (at least the hi-res versions) are stored in  the Textures.tfc file. Just look for the single biggest file in the CookedPCFolder. I don't know how the mapping to load textures works yet. Content (i.e. art assets) appears to get an objectlist entry but I don't even know how to 'decompile' content, although that's what Gildor's uemodel program does.

 

New art assets just might be possible with a program that allows a complete rebuild of the objectlist. Basically would have to add the ability to add new classes to the code, which is a step up from just trying to increase an existing function's filesize.

 

All all remote future possibilities if this all can get worked out.

 

Sounds very interesting. My first thought after new EU patch was to extract new indexes for variables/functions directly from upk and simply replace old indexes with new ones, since nothing else has changed. But I couldn't find upk file format anywhere on the Internet (because it's proprietary, I guess). With this information I may try to do this. Thanks, Amineri!

I'm not very good programmer myself and I probably can't make a program(s) you want, but I'll try this for my own mod and share a code... if it will ever work. :smile:

 

Even without the code the information allows some new tricks by modifying a variable's objectlist entry (in UE Explorer this is the "Table Buffer").

 

By changing the first word of the entry you can change the type of the variable. The possible values of figured out so far are:

boolean : -376 = 88 FE FF FF
integer : -390 = 7A FE FF FF
float   : -386 = 7E FE FF FF
dynamic array : -375 = 89 FE FF FF
static array : same as base type (length defined word 5 of variable buffer)
enum : -377 = 87 FF FF FF
class : -394 = 76 FE FF FF
struct : -399 = 71 FE FF FF
string : -398 = 72 FE FF FF
vector : -399 = 71 FE FF FF // is a struct, apparently
function : -387 = 7D FE FF FF

root class object : 00 00 00 00 

All variables are by default size 1 arrays, so you can turn them into larger-sized static arrays by changing word 5 of the variable buffer. 

 

 

 

Great work as usual Amineri, but we've come to expect nothing less from you :smile:

It is quite possible to map an object's name to its offset. A few weeks ago, I began re-writing how Custom Mods work in ToolBoks which would allow forward compatibility and lessen the burden of mod authors updating their mods after each update.

Below is a revamped implementation of a Custom Mod that I was using to test this:
 

NAME=Disable Friendly Fire (forward-compatible)
AUTHOR=bokauk
DESCRIPTION=New ToolBoks Custom Mod example to demonstrate forward-compatible mods by using variable names.

Compatible with XCOM Enemy Unknown versions:
 - All versions?


LOCATION=XComGame\CookedPCConsole\XComGame.upk>XGUnit.AddTargetsInSight
OFFSET=153
INSTALL_HEX=
{
	We want to modify the following code
		from:	arrVisibleTargetsToUse.Length == 
		to:		arrVisibleTargetsToUse.Length <
		
	In this instance, this could be achieved just by changing the byte for the symbol, but for the purposes of demonstrating how this works, I'll include the variable name as well (the bytecode for symbols (==, <, etc) do not change between versions).
	
		
	Original bytecode looks like:
	9A 36 00 80 B3 00 00
	
	Broken down:
	9A 		36 			00 						80 B3 00 00
	==		.Length		LocalVariableToken		arrVisibleTargetsToUse
	
	The bytes for variables change when the UPK file is updated after a new version of the game has been released.
	
	Instead of hard-coding the bytes for the variable, we can type the name of the variable inside of parentheses.
	
	ToolBoks will then automatically convert the variable name into the correct bytes for the version of the game that the user has.
	
	Hex	Symbol	Description
	9A	==		Friendly fire enabled
	96	<		Friendly fire disabled
}
96

{ .Length, LocalVariableToken }
36 00

(arrVisibleTargetsToUse) { Variable names inside of parentheses will be converted into hex by ToolBoks }

{
	Without comments, it looks like:
	96 36 00 (arrVisibleTargetsToUse)
	
	For this particular version of the game, ToolBoks will convert this to:
	9A 36 00 80 B3 00 00
	
	In a later version of the game, ToolBoks might convert it to something like:
	9A 36 00 79 B9 00 00
	
	
	This will make the mod forward compatible, as long as that specific function is not updated.
}

UNINSTALL_HEX=

The "LOCATION" parameter would now allow you to specify where to modify by using the File>Class.Function notation (sometimes File>Class.State.Function), which would be the same between game versions.

It would also be possible to modify the EXE ( LOCATION=Binaries\Win32\XComGame.exe ) to disable hashcheck, (read from loose DefaultGameCore.ini?) and also modify INI and INT files, allowing for all game files to be modified.

With the above CM, I got as far as returning the correct function position in the UPK from parsing the LOCATION, but have not yet worked on the variables inside of parenthesis, so still a bit of work to do.

As you mentioned, the next step up from this would be to write a compiler so you would only need to type:

arrVisibleTargetsToUse.Length ==

instead of:

96 36 00 (arrVisibleTargetsToUse)

Although, both would be future compatible, which I think is the key issue for now.

Regarding point 2 in your post referring to resizing functions; it is my understanding that this would shift the entire contents of the file after that point. This would mean that most offsets (including function jumps) would have to be corrected. I've toyed with the idea of looking into it more, but for me, this is the current holy grail of modding XCom and probably too ambitious for me to take on at the moment. Please correct me if I have misunderstood the shifted offsets issue, or you know a way whereby this wouldn't be a problem :smile:

 

EDIT: Silly me, jump offsets are relative, so that part of it wouldn't be an issue :smile:

 

This looks really good. I really like the idea of being able to directly declare variables and have them mapped to references when the mod is applied.

 

But I'm thinking that the variable names will have to include its owners in order to provide uniqueness.

 

For example the class variable m_kUnit is defined in 14 classes in EW's XComGame.upk, and every one of them has a different reference value.

 

For instance in XGCharacter : m_kUnit is D1 B5 00 00

But in XGAIBehavior : m_kUnit is 8F 9A 00 00

 

There is a similar issue regarding the actual compilation of the primitive symbols into bytecode.

 

For example, the '==' operator has several possible bytecodes:

  • F2 is '==' operator for boolean variables
  • 9A is '==' operator for integer variables
  • B4 is '==' operator for float variables
  • D9 is '==' operator for vector variables
  • 8E is '==' operator for (unknown)
  • 7A is '==' operator for string variables
  • FE is '==' operator for class variables

The compiler does a fair bit of work checking the types of the variables at compile time (which is why it can throw up type mismatch errors).

 

Also, there is no '==' operator for byte/enum values. Instead the compiler automatically converts bytes to ints, then compares using the 9A operator.

 

I'm not saying that any of these issues are insurmountable, but would definitely have to be addressed.

 

------------------

 

As you say, the jump offsets within a function block are all relative to the beginning of the function block. What would have to be corrected is word 10 of every subsequent objectlist entry to reflect the updated file position of the object.

 

If one function were made smaller by an identical amount then only every file location reference in between the two functions in the objectlist would have to be updated.



#7
Bertilsson

Bertilsson

    Old hand

  • Members
  • PipPipPip
  • 616 posts
  • bytes 1D-20 = start of namelist table (0x2B9 for XComGame.upk, 0x81 for XComStrategyGame.upk)
  • bytes 21-24 = unknown (0xDBA4 for XComGame.upk, 0x57AC for XComStrategyGame) -- might be namelist length?
  • bytes 25-28 = start of objectlist table (0x12054 for XComGame.upk, 0x708DC for XComStrategyGame.upk)
  • bytes 29-2C = unknown (0x7A3 for XComGame.upk, 0x73C for XComStrategyGame.upk)

The second word is the number of objects, which is exactly 56228 (0xDBA4) according to UE Explorer package tab for XComGame

(And according to the same, the number of namelist entries in XComGame is 33384 (0x8268) )

 

So based on this, would it be correct to calculate the end of the object list by using this formula?

Word3 + (Word2 * 17 * 4)

 

Edit: I just added the type of the "==" operators in the wiki http://wiki.tesnexus...es_XCOM_Modding

At least for me that will make picking the correct operator much quicker in the future since I will not have to reverse engineer it every time I forget which one is used for what :)


Edited by Bertilsson, 20 November 2013 - 09:09 PM.


#8
Amineri

Amineri

    Resident poster

  • Premium Member
  • 3,927 posts

Right ... the convention seems to be  size then location.

 

In the objectlist entries word 9 is the size of the object while word 10 is the file position of the object.

 

If so then the word preceding the bytes 1D-20 position should be the number of namelist entries, so:

  • bytes 19-1C : number of namelist entries (0x8268 for XComGame and 0x35A3 for XComStrategyGame)
  • bytes 1D-20 : start of namelist table (0x2B9 for XComGame, 0x81 for XComStrategyGame)
  • bytes 21-24 : number of objectlist entries (0xDBA4 for XComGame 0x57AC for XComStrategyGame) 
  • bytes 25-28 : start of objectlist table (0x12054 for XComGame, 0x708DC for XComStrategyGame)

This should make parsing these tables pretty straightforward.



#9
Bertilsson

Bertilsson

    Old hand

  • Members
  • PipPipPip
  • 616 posts

Am I completely wrong in thinking that it is actually realistic to create a very limited compiler for the object code of functions?

 

The type of operator to use can be deducted by looking at the datatypes of the objects the operator is used upon...

The virtual memory sizes could be looked up and/or deducted from object list.

 

It wouldn't have to be perfect, just covering the most commonly used operators and "upk-local" references would go a very long way...



#10
Amineri

Amineri

    Resident poster

  • Premium Member
  • 3,927 posts

I think it's possible.

 

Such a compiler would have a lot of the functionality of UE Explorer -- compilation would only be possible within the context of a given upk. That upk would have to be parsed in order to read all of the possible names allowed. Having that information in hand you could also read the types of the variables to allow for correct selection of operators.

 

Not easy ... but possible.

 

EDIT: I'm mucking about with Java right now trying to write some experimental scripts. The nice thing about coding such a thing in java is that it would be quite easy for XCom OS X users to use as well.


Edited by Amineri, 20 November 2013 - 10:10 PM.





Page loaded in: 0.742 seconds