Rarara - Reversing (Secuinside CTF)
This is a writeup of the challenge rarara from the secuinside 2014 pre-qual ctf.
I participated in this challenge together with Yoav Ben Shalom, Matan Mates, and Itamar Marom. This was the probably the hardest challenge in the competition and only one team had managed to solve it a few hours before the competition ended when my team had to leave early.
We had figured out the solution but ran out of time before implementing it. Had we stayed and finished this level and another we likely would have been in the top ten teams, but oh well!
The challenge
We were given a binary. You can download it from here. Running it from the command line yields
While the easy solution might be to install 64 bit windows, examining the assembly reveals that the code is actually 32 bit binary. I’m not sure why this check is done, but it’s easy enough to patch it (a single jz -> jnz) and allow the binary to run on 32 bit systems.
Next we are presented with a prompt asking for input. Trying a few random inputs, we get the message “Wrong!”. It’s time to open up IDA and see what this program does!
Anti-reversing
Once we’ve identified the important function, we can see a few things. First, the length of the input is checked and compared to 13.
Then each of the letters is checked to make sure it is alphanumeric
And then we reach this block at which point IDA’s analysis fails us:
This block is repeated in many different locations throughout the code.
What is this block doing? When we call $+5 it puts our address on the stack. 3E+var3E is 0x3E - 0x3E = 0, so the add instruction adds 5 to the address just placed on the stack. Finally, we return to that address, or just after the retf instruction.
In other words, this whole block is just a compilicated nop that screws with IDA’s autoanalysis. By patching the binary to replace all instances of this block (plus some slight variations) with nops, we turn this:
into this:
So straight and organized! Now we can start working.
Understanding how the password is checked
After making sure that all of the characters are alphanumeric, the program places the letters into a global array in jumps of 0x14 while keeping track of the sum. The sum is later compared to 0x3E2 giving us our third constraint on the password (the other two being length and alphanumericness).
The global array is then processed in a loop. This was the third challenge in the competition involving CRC’s, so we quickly recognized the loop as calculating the crc64 of the global array (of length 0x100) with the letters of our password placed in the indexes as described above. The crc is then compared to a constant value.
If it is correct, then the value “Correct” is placed in the buffer where “Wrong” was held before so “Correct” will be printed at the end of the program.
So, to summarize we need to find a password of length 13 comprised of alphanumeric characters whose sum is 0x3E2 and whose CRC64 when placed in a set array at specific indexes is equal to 0x15AD90B88ABA1847
Breaking CRC
A previous challenge involved appending data to the end of a string in order to obtain arbitrary CRCs. That’s a less helpful ability for this challenge since we’re not changing 8 consecutive bytes in the crc (so the same algorithm doesn’t apply) and there’s no way of ensuring we will only have alphanumeric characters.
However, there is an interesting property of CRC that we can use - CRC is a linear function. This means that CRC(x) ^ CRC(y) ^ CRC(z) = CRC(x^y^z). Since CRC(0) = 0 in this implementation (usually there is a xor with 0xxffff… that prevents this), we get CRC(x^y) = CRC(x) ^ CRC(y).
Using this knowledge, we can check how each bit of input affects the final CRC. We start by filling the array with a throwaway value - we filled the array with 0x41 (‘A’) as our base value. We then change a single bit and look at the new CRC. Since CRC(x ^ y) = CRC(x) ^ CRC(y) we get CRC(array with bit changed) = CRC(array) ^ CRC(effect of changing that bit)
In other words, we can look at what happens when we change a single bit and then figure out which bits we want to change and which we want to keep the same.
We do this by building up a matrix where each column is the effect of changing a single bit. By multiplying each row with the vector of whether or not we actually change each bit we should get the corresponding bit in the final CRC.
The problem then becomes one of solving a simple linear equation. Since we have 91 input bits (13 letters * 7 unknown bits since the upper bit is always 0) and 64 rows in our matrix (since the CRC is 64 bits long) we still need to iterate over 91 - 64 = 27 bits and check if the solution is valid (alphanumeric and proper sum).
We split our solution into two parts. First we generated a ranked matrix in python. We then found all of the solutions in C++ since our original python solution was too slow. You can see the code below.
Optimizations
The original C++ code was still very slow. It took around an hour to go over all of the possible values for 27 bits. The speed was significantly boosted by checking after every 7 bits if the letter was valid. This effectively allowed us to short circuit solutions that were known to be invalid and improved the speed to around 30 seconds.
The other major optimization was moving from vector<bool>
to bitset
s. Because bitsets have built in operator&
and count
functions that work effeciently the speed of the inner
function was vastly improved (see below).
Using bitsets the speed of the program was reduced to 4 seconds.
But we can do even better :)
We know that the xor of the lower bits of all of the equations is zero, because the sum of the letters is even. Therefore, we can add a row to our matrix and we have one less bit that we have to brute force. Final speed is around 2 seconds.
Update: Even better
Michael Huzman pointed out that the builtin popcnt
instruction should vastly improve the speed of the inner
function. By adding the flag -march=corei7
gcc will automatically use this instruction for bitset’s count
function. This again halves the speed and now it runs in 1.2 seconds!
Disassembly before and after:
Before:
0000000000401e80 <_Z5innerRKSt6bitsetILm91EES2_>:
401e80: 48 89 5c 24 e8 mov QWORD PTR [rsp-0x18],rbx
401e85: 48 89 6c 24 f0 mov QWORD PTR [rsp-0x10],rbp
401e8a: 48 89 fb mov rbx,rdi
401e8d: 4c 89 64 24 f8 mov QWORD PTR [rsp-0x8],r12
401e92: 48 83 ec 18 sub rsp,0x18
401e96: 48 8b 3f mov rdi,QWORD PTR [rdi]
401e99: 48 23 3e and rdi,QWORD PTR [rsi]
401e9c: 48 89 f5 mov rbp,rsi
401e9f: e8 bc ec ff ff call 400b60 <__popcountdi2@plt>
401ea4: 48 8b 7b 08 mov rdi,QWORD PTR [rbx+0x8]
401ea8: 48 23 7d 08 and rdi,QWORD PTR [rbp+0x8]
401eac: 41 89 c4 mov r12d,eax
401eaf: e8 ac ec ff ff call 400b60 <__popcountdi2@plt>
401eb4: 49 8d 04 04 lea rax,[r12+rax*1]
401eb8: 48 8b 1c 24 mov rbx,QWORD PTR [rsp]
401ebc: 48 8b 6c 24 08 mov rbp,QWORD PTR [rsp+0x8]
401ec1: 4c 8b 64 24 10 mov r12,QWORD PTR [rsp+0x10]
401ec6: 48 83 c4 18 add rsp,0x18
401eca: 83 e0 01 and eax,0x1
401ecd: c3 ret
401ece: 66 90 xchg ax,ax
After:
0000000000401df0 <_Z5innerRKSt6bitsetILm91EES2_>:
401df0: 48 8b 17 mov rdx,QWORD PTR [rdi]
401df3: 48 8b 47 08 mov rax,QWORD PTR [rdi+0x8]
401df7: 48 23 16 and rdx,QWORD PTR [rsi]
401dfa: 48 23 46 08 and rax,QWORD PTR [rsi+0x8]
401dfe: f3 48 0f b8 d2 popcnt rdx,rdx
401e03: f3 48 0f b8 c0 popcnt rax,rax
401e08: 48 01 d0 add rax,rdx
401e0b: 83 e0 01 and eax,0x1
401e0e: c3 ret
401e0f: 90 nop
Additionally, Yoav Ben Shalom noticed that when we get 3 bad last letters we can skip all 2^6 iterations that will yield the same letters. The gist below has been updated and now the code runs in 0.7 seconds!
The answer
Running the code yields 14 solutions that cause the program to output “Correct”. A quick check against the program confirms that they all work. Here’s the full list:
B0CFbYG14gbvA
TSG2MZ0bt3JJN
Xnu073LG6YC7q
tDDR3OA6Q7YDv
sJUA3vYTH5BI1
r2F1l2z2SyK33
4cUZ89Rti11h2
2x0LSbXAIwB84
i05SJ5E9pnl7C
R3V3s1nGkRG0G <-- Looks like t3xt
9X8LdMPGZL4Dg
0o4CaRv15p20k
V62T3zKIuDZI3
Jl2gKZf008oK6
Because we didn’t finish this challenge during the competition I can’t be completely sure but it looks like R3V3s1nGkRG0G was the solution.
You can see our code below, first the python and then the c++