7_tips_to_reverse_engineer_javascript
Last updated
Was this helpful?
Last updated
Was this helpful?
Recently I found myself deep inside the Apple's MusicKitJS production code to isolate user authentication flow for Apple Music.
Over the past few months, I've made , a web service that creates playlists from the songs you listened when working out with Strava turned on.
MoovinGroovin is integrated with Spotify, and I got a request from a user to add support for Apple Music.
As I looked into the integration with Apple Music, I found that to access user's listening history, I needed a "Music User Token". This is an authentication token generated from an OAuth flow. Unfortunately, the only public way to generate these is through authenticate()
method of Apple's MusicKitJS SDK.
This meant I would have to handle authentication with Apple Music on frontend, while all other integrations were handled by backend using passportJS.
And so, I decided to extract the auth flow out of MusicKitJS, and wrap it into a separate passportJS strategy (apple-music-passport).
This is where the journey begins...
Use beautifiers to clean up minified code.
Understand how minifiers compress the execution (control) flow into &&
, ||
, ,
, ;
, and (x = y)
Recognize async constructs
Recognize class constructs
Use VSCode's rename symbol
to rename variables without affecting other variables with the same name.
Use property names or class methods to understand the context.
Use VSCode's type inference to understand the context.
Most of these are not very powerful. They will add whitespace, but that's it. You will still need to deal with statements chained with ,
, compressed control flow by &&
or ||
, ugly classes and asyncs, and cryptic variable names. But you will quickly learn that - unless you're dealing with event-driven flow - you can just stick with where the debugger takes you and ignore most of the cryptic code.
There was one tool (can't find it) which attempted assigning human-readable names to the minified variables. At first this seemed cool, the truth is this will easily mislead you if the random names make somewhat sense. Instead, rolling with the minified variable names and renaming what YOU understand is the way to go.
&&
, ||
, ,
, ;
, and (x = y)
As said above, you will still need to deal with cryptic statements like this:
Let's break it down:
void 0
as undefined
void 0
is undefined
. So this checks if undefined === r
. Simple as that.
(x = y)
This assigns the value (""
) to the variable (r
) and returns the assigned value. Be conscious of this especially when you find it inside a boolean evaluation (&&
or ||
).
Consider example below, only the second line will be printed:
Logically, this will be evaluated as:
Which is:
So while the second line will print, the first one will not.
&&
and ||
The code above used &&
to execute the console.log
.
will ever be executed if and only if abc is truthy.
In other words, if we have
Then console.log('will not print');
will never be reached.
And same, but opposite, applies to ||
:
What does this mean for us when reverse-engineering minified JS code? Often, you can substitute
with more-readable
So far, we understand that
Really means
We see, though, that in the original code, it's actually followed by a comma:
For our reverse-engineering purposes, it just means that each statement (separated by comma) will be evaluated and the value of last statement will be returned.
In other words, think of a chain of comma statements as a mini-function. And so, we can think the code above as:
Overall, we can now read
as
Depending on the kind of code that you reverse-engineer, you may come into contact with async-heavy codebase. MusicKitJS was an example of this, as it handled requests to Apple Music API, so all methods that made requests were async
.
You may find the async functions transpiled into an awaiter
and generator
functions. Example:
Sometimes the __awaiter
and __generator
names might not be there, and you will just see this pattern:
The important part here is that if we have some like this:
We can read most of the body inside case N
as a regular code, and the second value of returned arrays (e.g. /* DEF */
) as the awaited code.
In other words, the above would translated to
Similarly to the previous point, depending on the underlying codebase, you may come across a lot of class definitions.
Consider this example
Quite packed, isn't it? If you're familiar with the older syntax for class definition, it might not be anything new. Either way, let's break it down:
function(...) {...}
Constructor is the function that is called to construct the instance object.
You will find these defined as plain functions (but always with function
keyword).
In the above, this is the
which we can read as
__extends
and x.call(this, ...) || this;
However, when you see that:
The constructor definition is nested inside another function with some arg
That that unknown arg is called inside the constructor
That that same unknown arg is also passed to some function along with out class
Then you can read that as
x.prototype.xyz = {...}
or Object.defineProperty(x.prototype, 'xyz', {...}
These are self-explanatory, but let's go over them too.
is a getter method that can be read as
Similarly, assignments to the prototype can be plain properties or methods. And so
is the same as
Use VSCode's rename symbol
to rename variables without affecting other variables with the same name.
When reverse-engineering minified JS code, it crucial you write comments and rename variables to "save" the knowledge you've learnt parsing through the code.
When you read
and you realize "Aha, r
is the username!"
It is very tempting to rename all instances of r
to username
. However, the variable r
may be used also in different functions to mean different things.
Consider this code, where r
is used twice to mean two different things
Identifying all r
s that mean one thing would be mind-numbing. Luckily, VSCode has a rename symbol
feature, which can identify which variables reference the one we care about, and rename only then:
Right click on the variable
Set new name:
After:
Let's go back to the previous point where we had
When you are trying to figure out the code, you can see we have a quick win here. We don't know what r
was originally, but we see that it is probably an attribute or an element, based on the properties that were accessed.
If you rename these cryptic variables to human-readable formats as you go along, you will quickly build up an approximate understanding of what's going on.
Similarly to point 6. we can use VSCode's type inference to help us deciphering the variable names.
This is most applicable in case of classes, which have type of typeof ClassName
. This tells us that that variable is the class constructor. It looks something like this:
From the type hint above we know we can rename xyz
to DomSupport
That's all I had. These should take you long way. Do you know of other tips? Ping me or add them in the comments!
There's plenty of these tools, just google for a beautifier / prettifier / deminifier / unminifier and you will find them. and VSCode extensions work just as well.
Remember that JS supports . This means that right hand side of
One more thing here - be aware of the .
This is the .
Either way, these are async/await
constructs from TypeScript. You can read more about them in .
Similarly to __awaiter
and __generator
, also __extends
is a . And similarly, the variable name __extends
might not be retained.
Object.defineProperty
can be used to defined a :