New tool available that recalulates the jump offsets automatically

Bertilsson · September 28, 2013

The new Jump Offset Repair tool is now available here:

http://hem.bredband.net/bertrich/XCOM/JumpRepairTool.htm

It is currently very beta, looks like crap and has very limited error checking, but it should get the the job done.

Paste modified/broken byte code in left box
Paste modified/broken View Tokens in the middle box
Press start button ~~(need for start button to be removed shortly)~~ (I now think it is more intuitive for some if it remains)
Use the drop down selectors to assign new targets for broken jump offsets (Any change also applies downward while valid)
Repeat step 4 until no broken offsets exist
Receive new code with valid offsets and corrected header in the right box.

Features planned but not yet implemented:

Edit: In progress: General overhaul of the interface
Curly brackets and indentation
Edit: Done: ~~Change end of function-reference in header to FF FF and force user to come back with new view tokens if 53-token missing~~

Any feedback, good or bad, is welcome.

Edit: Especially interested in feedback regarding bugs and suggestions regarding logical rules to limit number of valid jump targets for different types of tokens as much as possible.

Currently the following validation rules have been implemented:

Goto (06)
- Allowed to jump to any target except self
- Is the only token allowed to jump backwards
foreach (2F and 58)
- Must target 30 IP-tokens (which leaves very few possible choices which is good)
If not (07)
- Must jump forward
Case (0A)
- Must jump forward

The Case tokens can probably be limited quite a lot, but I am unsure exactly how much.

Can anyone confirm if the following statements are true or false?

Case tokens must always target another case, except for the very last case inside a switch statement
Case tokens are not allowed to jump past another case token
Case tokens are not allowed to jump into or past another switch statement
Default case 0A FF FF is mandatory
The Switch statement does not include any memory offset values at all and can completely be ignored by the tool.

I'm fairly sure that 1, 2, 3 (and 5) are facts which will limit all case except the last one to a single valid target, but I suspect that 4 is false, which makes enforcement of the rule quite a bit more tedious to implement and the number of possible targets for last case very large.

Edited September 29, 2013 by Bertilsson

Amineri · September 29, 2013

Can anyone confirm if the following statements are true or false?
Case tokens must always target another case, except for the very last case inside a switch statement

Case tokens are not allowed to jump past another case token

Case tokens are not allowed to jump into or past another switch statement

Default case 0A FF FF is mandatory

The Switch statement does not include any memory offset values at all and can completely be ignored by the tool.
I'm fairly sure that 1, 2, 3 (and 5) are facts which will limit all case except the last one to a single valid target, but I suspect that 4 is false, which makes enforcement of the rule quite a bit more tedious to implement and the number of possible targets for last case very large.

As best I recall, all 5 of these rules are always followed. The default case is mandatory EXCEPTING only the very special case that every single case statement has a return, in which case it might work to omit it. In practice I use the default case every single time, because it isn't worth it playing with fire that way.

I also tend to use a break statement (just a 06 ## ## goto statement) after the default case, as this allows the UE Explorer decompile tool to properly terminate the switch statement in terms of indenting/curly braces.

Not using the break statement is perfectly valid (and the vanilla code often does not have it), which causes UE Explorer to put the remaining code indented as if it is a part of the default case, even though it isn't.

------------------

Regarding the foreach lines, there are basically 2 cases:

1) The array is a 'simple' array, such as :

    foreach m_arrObjectives(kObjective,)

This one works by only having to update the jump token located at the end of the hex line:

foreach m_arrObjectives(kObjective,)

58 01 13 42 00 00 00 2E 42 00 00 00 4A 69 00

In this case the jump offset is 0x0069.

2) The array is a "complex" array composed via a context reference:

foreach GEOSCAPE().m_arrMissions(kMission,)

These are much trickier to manually adjust. The final jump offset token still is set as above, but there is an additional relative offset within the context construction that has to be updated. I'll give a full example from XGStrategyAI.GetAlienBases (this is a vanilla function to return all current alien base missions -- unused in vanilla (or even with my mod :wink: ) )

    foreach GEOSCAPE().m_arrMissions(kMission,)
    {
        // End:0x68
        if(kMission.m_iMissionType == 6)
        {
            arrBases.AddItem(kMission);
        }        
    }    
    return arrBases;

Following is the entire loop including the UE Explorer token view breakdown:

(000/000) [58 19 1B 69 0E 00 00 00 00 00 00 16 53 00 94 2F 00 00 00 01 94 2F 00 00 00 F7 42 00 00 00 4A 69 00]
	AI(45/33) -> C(31/23) -> VF(10/10) -> EFP(1/1) -> IV(9/5) -> LV(9/5) -> NP(1/1)
	foreach GEOSCAPE().m_arrMissions(kMission,)

(02D/021) [07 68 00 9A 19 00 F7 42 00 00 09 00 35 37 00 00 00 01 35 37 00 00 2C 06 16]
	JIN(37/25) -> NF(34/22) -> C(30/18) -> LV(9/5) -> IV(9/5) -> ICB(2/2) -> EFP(1/1)
	if(kMission.m_iMissionType == 6)

(052/03A) [55 00 F8 42 00 00 0A 00 00 F7 42 00 00 16]
	DAAI(22/14) -> LV(9/5) -> LV(9/5) -> EFP(1/1)
	arrBases.AddItem(kMission)

(068/048) [31]
	IN(1/1)
	
(069/049) [30]
	IP(1/1)
	
(06A/04A) [04 00 F8 42 00 00]
	R(10/6) -> LV(9/5)
	return arrBases

Note that the final 2-byte pair in the foreach line is indeed the 0x069 address offset of the Iterator Pop (IP) token.

Here is a breakdown of the foreach line, including breaking down the GEOSCAPE(). context reference :

58 -- foreach token
19 -- context token
1B 69 0E 00 00 00 00 00 00 16 -- virtual function GEOSCAPE()
53 00 -- size of next context
94 2F 00 00 -- return address next context
00 -- unknown -- always 00
01 -- class variable token
94 2F 00 00 -- address of arrMissions
00 -- local variable token
F7 42 00 00 -- address of kMission
00 -- unknown, but always 0 for foreach statement
4A -- no parameter token (unsure what second parameter in foreach statement could be)
69 00 -- offset of final iterator pop (IP) token

In particular, in the special case of foreach loops and contextualized arrays, the size of next context must include the entire foreach loop. In this case the size of the next context is 0x53. This size of context is always computed in memory bytes, as for any offset.

This size comes from:

9 bytes for class variable arrMissions

9 bytes for local variable kMission

2 bytes for 00 4A construct near end of foreach

2 bytes for 69 00 foreach jump offset

37 bytes for if(kMission.m_iMissionType == 6) line

22 bytes for arrBases.AddItem(kMission) line

1 byte for iterator next token

1 byte for iterator pop token

This adds up to 83 = 0x53 memory bytes.

Failure to update this context reference offset WILL cause the program to crash when the loop is executed.

Memory sizes (in decimal) for each of the following lines can be extracted from the token view -- for example,

	JIN(37/25) -> NF(34/22) -> C(30/18) -> LV(9/5) -> IV(9/5) -> ICB(2/2) -> EFP(1/1)
	if(kMission.m_iMissionType == 6)

the 37 shows the memory byte size for the entire line (the 25 is the file byte size).

Unfortunately, correctly automating the "rest of foreach line" part isn't so easy. The UE Explorer token view breaks that line down as:

(000/000) [58 19 1B 69 0E 00 00 00 00 00 00 16 53 00 94 2F 00 00 00 01 94 2F 00 00 00 F7 42 00 00 00 4A 69 00]
	AI(45/33) -> C(31/23) -> VF(10/10) -> EFP(1/1) -> IV(9/5) -> LV(9/5) -> NP(1/1)
	foreach GEOSCAPE().m_arrMissions(kMission,)

It show the 9/5 and 9/5 for the arrMissions class variable (the IV label) and the kMission local variable (LV label), and gives the 0x4A no parameter token (NP label), but does not list the additional byte preceding the 0x4A token, nor the extra 2 bytes for the jump offset at the end.

This kind of starts delving into the murky world of relative offsets used in context constructions and boolean constructions (booleans have to use optimization tokens to be able to skip past code), which is a whole can of worms that you probably don't want to mess with at this point.

However, I wanted to point out that the combination of context+foreach loop requires extra special care to make work correctly.

Bertilsson · September 29, 2013

As best I recall, all 5 of these rules are always followed. The default case is mandatory EXCEPTING only the very special case that every single case statement has a return, in which case it might work to omit it. In practice I use the default case every single time, because it isn't worth it playing with fire that way.

I also tend to use a break statement (just a 06 ## ## goto statement) after the default case, as this allows the UE Explorer decompile tool to properly terminate the switch statement in terms of indenting/curly braces.

Not using the break statement is perfectly valid (and the vanilla code often does not have it), which causes UE Explorer to put the remaining code indented as if it is a part of the default case, even though it isn't.

Thank you!

I'll just take the lazy route and implement the following logic:

For any non-default case token
- Identify the closest (forward) token line where the first byte is 0A and make it the only valid target

Regarding the context tokens such as for terniary operators and the complex foreach example it will most likely remain out of scope due to the following reasons:

The tool is completely depending on view tokens data which will probably be missing or corrupt when UE E attempts to decompile incorrect context references in the first place.
From my perspective this is more in the nature of "the inner workings of complex statements" rather than something related to the supported jump tokens.

But I do have some others ideas about making a separate tool which would be more in the nature of a "workbench", where user would get assistance in both building and combining "atomic statements" into more complex structures where figuring out the context offsets would probably be a key factor.

I will start a new topic if I ever get past the point of only thinking about it :smile:

Edited September 29, 2013 by Bertilsson

Bertilsson · September 29, 2013

A little screenshot from the latest version.

http://hem.bredband.net/bertrich/XCOM/Temp/RepairTool076.png

New features added

Indentation (Stroustrup style, since it was easy to implement and I actually like it)
Automatic target selection when only one option exist (for example every case token)
Goto references in object code (hmm... just noticed in screenshot that I need to fix it for drop down also)

Features still missing

Some user input validation rules
Color coding to highlight what is broken or not

Edit: Features that I'm thinking about but am unsure if anyone would really find useful:

Add an optional base offset box
- That auto-creates a toolboks mod if value is not 0
- Add an additional optional textarea for Original unmodded byte code
  - That results in a smaller toolboks mod with only changes

Edited September 29, 2013 by Bertilsson

Bertilsson · September 29, 2013

Regarding the context tokens such as for terniary operators and the complex foreach example it will most likely remain out of scope due to the following reasons:
~~The tool is completely depending on view tokens data which will probably be missing or corrupt when UE E attempts to decompile incorrect context references in the first place.~~
~~From my perspective this is more in the nature of "the inner workings of complex statements" rather than something related to the supported jump tokens.~~

On second thought and after some testing:

Incorrect context offsets inside complex foreach statements does not mess up UE E view tokens data.

Taking the address of IP-token-line minus memory address of foreach-line plus everything after EFP in foreach-line plus 3 seems tedious but very doable.

Figuring out if it is needed can be done by looking for 19 in the second byte of the hex string and figuring out which storage byte to change could be done by searching for the next following 16 byte (and praying that 16 is not part of the context name)... or perhaps even a bit more clever by checking that VF(10/x) corresponds to 58 19 x*bytes 16.

But what happens when there is an array in an array or a something more complex like foreach Universe(1).GEOSCAPE().m_arrMissions(kMission,) ?

Amineri · September 30, 2013

The tool is looking really great. This will be of great help in more quickly re-coding stuff.

But what happens when there is an array in an array or a something more complex like foreach Universe(1).GEOSCAPE().m_arrMissions(kMission,) ?

I have exactly the same question myself. I've never seen such a construction used in the vanilla code, and have never had to use such a construction.

This case would start with 2 context 0x19 tokens, and would have 2 relative offset values, but I don't know if only the 2nd one or if both the relative offsets would have to include the entire foreach loop.

In practice I think you can simply ignore this case as it's so extremely specialized as to not need an automated tool to handle.

Figuring out if it is needed can be done by looking for 19 in the second byte of the hex string and figuring out which storage byte to change could be done by searching for the next following 16 byte (and praying that 16 is not part of the context name)... or perhaps even a bit more clever by checking that VF(10/x) corresponds to 58 19 x*bytes 16.

In general a context statement (with a single context) looks like:

19 -- context token

< > -- reference to first element -- this can be a function or a variable

## ## -- 2 bytes for size of next element

## ## ## ## -- return value of next element

00 -- unsure but always seems to be 00

< > -- reference to second element

Valid formats for either first or second element are:

00 ## ## ## ## -- local variable

01 ## ## ## ## -- class variable

1B ## ## ## ## 00 00 00 00 <parameters> 16 -- virtual function

1C ## ## ## ## <parameters> 16 -- final function

Final functions are pretty rare (they are defined using the keyword final when declaring the function). Final functions are not allowed to be re-defined in a child class.

The "return value" can be various things:

For functions its the reference to the function's actual return value (can be found using UE Explorer). Functions with no return value use 00 00 00 00 here.

For variables the return value is the same as the reference to the variable itself.

The key here is that searching for a 0x16 token isn't going to get you to the end of the first element of the context reference. I think some sort of more complex logic would be needed. The possibility of parameters can make the matter a little tricky (particular since a parameter could in itself be a function reference in the worst case).

It's pretty uncommon for the first element of a context to be a local variable, although syntactically it's allowed.

Basically you'd look for the 19 context token -- to work with a foreach statement the 19 token has to immediately follow the 58 foreach token.

Next you'd have to branch off the next token value, based on it being 00, 01, 1B, or 1C. In practice almost all such references would be 01 or 1B.

If the token is a 00 or 01, then the size reference would always be 4 bytes forward.

If the token is a 1B or 1C, then most of the time you can skip forward until the first 16 token is found.

The exception would be a very complex construction such as : Foo(Bar(5), 4).Snafu

In this case the first context is a function which has another function as one of it's parameters. Again, this is an extremely complex and unlikely-in-practice construction that your tool (which should automate the easy cases) shouldn't have to handle, IMHO.

----------------------------

Again I just want to say fantastic work on this. ^_^

Bertilsson · September 30, 2013

Short term

I will add a warning text whenever a complex foreach section is present that it may require manual context size adjustment if the inner size of the foreach section was modified.

Long term

I will very likely attempt to make it automatic.

One thing still has me puzzled though

I've noticed that foreach sections may contain multiple IP 30-tokens (inner ones without even being preceded by the IN 31-token).

Any way that could be abused in a positive way by the tool to get around the issue with the context size?

Amineri · September 30, 2013

Oh, right.

If there is a return statement within the foreach loop, the return statement has to be preceded by an iterator pop token in order to clean up the foreach loop (I don't understand all of the details). Since it's not proceeding to the next iteration of the loop there's no iterator next token.

I think this basically means that the only valid offset location in the original foreach line is an iterator next + iterator pop token pair.

I really don't understand why the context size has to include the entire foreach loop. You'd think that the context size would only describe the context construction, independent of the foreach loop. I can only guess that it's some oddity of the Unreal Engine. It is a bit of a pain, though.

As to get around the issue with the context size ... I don't think it would work. Basically the context relative offset has to point to the same point as the function-absolute offset at the end of the foreach line. It's just the the offset at the end of the foreach line is relative to the beginning point of the function while the context offset is relative to the current position (well, the position after the return value that immediately follows the context size/offset).

---------

Oh, and I should point out that UE Explorer will happily decompile (showing it as correct) any context statement with an incorrect context size, which can make such things quite tricky to find/fix when debugging. The foreach statements are just particularly tricky because any change to any lines within the foreach loop requires a change to the context size, which breaks the 'independence' of the individual lines.

Edited September 30, 2013 by Amineri

Bertilsson · October 1, 2013

I have now "field tested" the tool while working with a few functions in the increased squad size mod and my personal review is:

Pros

I am very satisified that I am no longer being forced to add things from top to bottom in small increments to avoid the risk of losing track of the jump offsets and having to recheck everything from time to time.
Now I just add/remove everything in one big edit in HxD and import it in UE Explorer and then the tool makes it truly easy to define new jump offsets and to validate, identify and repair any already existing ones.
Redefining an if --> else if --> else statement nested into 6 sections by only re-targeting the first one and letting the tool fix the rest was very satisfying :smile:

Cons

The tool does not translate goto statements into breaks, else sections or anything else fancy like UE Explorer does.
- It just shows the goto's the way they are (which on the other hand is good for anyone who wants to understand how those things are created).
- I will most likely make some improvements in this area
  - Especially fix the bug that presents original goto targets in the drop down selectors object code lines.
It is very counter intuitive to have the target drop down selectors on the left side of the code.
- I was aware of this when I made the design choice to put the selectors to the left.
- The idea was that it would be less disturbing than having to wrap code lines, possibly messing up indentation or forcing the user to scroll around to find selectors when code lines are very long.
- I now regret that design choice and plan to fix it.
  - Selectors should be on the right side of the code and code lines should wrap without breaking indentation.
  - Alternatively the code lines should be truncated and possible to click on or hover above with the mouse to reveal the full code line.
  - But in either case the code should be on the left side of the selectors and there should be no need to scroll horizontally ever.
The object code is a bit difficult to read without any color aid like in notepad++ or UE Explorer and no extra line feeds between code sections
- I will most likely add some basic keyword high-lighting, extra line feeds and vertical lines between curly brackets.

So there is still a lot of room for improvement but the overall experience is that it is already very very good compared to using the old tool or only UE Explorers token view + HxD.

Edited October 1, 2013 by Bertilsson

dubiousintent · October 1, 2013

Even if I never have a need to use it, this looks to be a very useful tool for the Unreal Modding Community in general. Please consider adding an entry to the Unreal Wiki or similar site so it gets a wider audience.

-Dubious-

New tool available that recalulates the jump offsets automatically

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members