I have fantastic news, unfortunate news, and good news.
Fantastic news: I know exactly what I need to do. I found exactly what I'm looking for and have proven that it will give me the results I want.
Unfortunate news: it's going to be tedious as all fuck to implement fully.
Good news: because of how this was implemented I can actually talk about it here in a spoiler-free fashion. So... without further ado!
At the end of my last post, where I had left off was finding what is effectively a master list of talk sprites in cpac_3d.bin. At specific memory addresses there are specific hex codes that point to the individual sprites; replacing one with another will replace every single instance of that sprite in the game. Unfortunately, it wouldn't suffice for the scope of my project, because what I need to do is replace only specific instances of the sprites in question.
I turned my sights elsewhere, toward the massive list of files with names of the following format:
For every "st" (I assume, "stage") there are dozens of "game###"s as well as a "root" file (e.g. st01/st01_root.xml.lz), and a handful of "demo###"s. Almost all game and demo files have those sub-files corresponding to each language.
The most important thing to know about these files is that they are compressed (signified by .lz). You will have absolutely no luck making sense of them as they are. Thankfully, looking into existing tutorials on ROMHacking such as this one, I was able to make some progress. CrystalTile2 can extract the individual files, BatchLZ77 can decompress them, and then CrystalTile2 can re-open them (note: it isn't a recognized file type but if you ask CrystalTile2 to open it anyway it will display the data correctly). Any edits made to this can then be saved, and BatchLZ77 can re-compress the file, after which it can be re-inserted into the ROM.
Is this a headache? Yes. But does it work? Also yes. BatchLZ77 and CrystalTile2 so far are the only tools I've needed to use for this entire project.
From here on out, I won't refer to the files as "[...].xml.lz", which is their compressed format. The files being referenced are simple .xml files, and they refer to each other as ".xml" - these files contain the instructions for how the game plays out. They are the scenes and gameplay. If I refer to "filename.xml" I am referring to the file "filename.xml.lz" after it has been extracted and decompressed. Just keep in mind it will need to be compressed again and re-inserted into the ROM to actually implement any changes.
I've done a lot of testing with these files. Pro tip: the st01/st01_root.xml file is what tells the game what to pull when you click "New Game." The first line of legible code in it (after what appears in plaintext as a chunk of gibberish) calls st01/st01_game000_Expand.xml, which is the first scene in the game, but if you replace that with literally any other .xml file in the game it will pull that one up instead. This makes it really easy to test things for specific scenes, and also determine what each scene is. This is how I've been able to figure out as much as I have.
I'll go ahead and lay out what I've determined about the contents of these .xml files:
"st##" refers to the locations in the game. "st01" is the junkyard, "st02" is the supervisor's office basement, "st05" is Lynne's apartment. The game is logically divided by these distinct locations.
"game###" xml files refer to specific scenes at each location. In most cases "game000" is the first scene that plays out at that location, and so on.
"demo###" xml files refer to animated cutscenes that play out at a particular location. In most cases if a cutscene is triggered in one location but takes place at another (for example, if the main scene is happening at st09 but someone triggers a flashback that takes place at st01), the demo file will live in the location of the cutscene as opposed to the greater scene (in the above example, the demo file will be in st01).
To clarify the last point somewhat: "demo" files are generally called within "game" files as assets. The "game" files are the biggest meat of the game.
"game###" files also call upon some form of sub-file or variable with names like "m##_####". After trial and error I was able to determine that these "m"s are what call lines of text and their associated talk sprites. I assume "m" stands for "message."
"m##_###"s are also mostly numbered in reference to the stage in which the scene takes place; so a game file in st01 might have text referred to as m01_0020. The only exception to this is if a demo file has text, in which case the m##_####'s will be numbered in reference to the game file that calls the demo file. I know this sounds overly complicated, but it just means that a cutscene originally called from a scene in st14 will have dialogue named "m14_####."
And, the most important thing, the thing which took way too long to figure out, the thing which is the solution to all of my problems: THE M##_#### VARIABLES ARE DEFINED UNIQUELY FOR THE SPECIFIC LOCALIZATION FILES FOR EACH LANGUAGE. INCLUDING THE TALK SPRITES.
This means you can mess with the st01_game000_Expand.xml file as much as you want, but you will never be able to change anything about the talk sprites that come up with the dialogue. You can swap around the existing ones and you can even mess with animations, but you cannot change anything about the existing dialogue. (I did have a bit of fun with the animations though.) My testing seems to indicate that these files have the overall information for how each scene plays, but they call the individual localization files for the dialogue, and the sprites are separately hard-coded for each line of dialogue in each language.
Thus, if you want to make any changes to specific talk sprites for specific lines of dialogue, you must do it individually for whatever language(s) you want to see the change in.
(Sorry other languages, I am way too lazy to do this for any language except English for now. I might eventually bother doing it for French, but even that is not a guarantee. Hopefully I'll document this well enough that anyone who wants to try their hand at this can replicate it for other languages.)
There is a bright side to this! I was hoping to implement this swap as something you can toggle on/off, so down the line, instead of replacing the .en files, I can replace one of the other languages instead to create an option. "English" vs "English (NG+)", if you will. You can change languages at any time in Ghost Trick without losing progress, so if I can figure out how, I'll just make the "language select" screen be a "mode select" instead.
So... the juice. The .en.xml files have pointers indicating what sprite(s) to use that appear in a format like:
FF ## 00 0D FF*
In which the ## is what indicates the sprite to use.
(*This isn't precisely accurate. I think some of the surrounding numbers are super