7UR7L3 Learns Cryptography: Cryptohack

“Cryptographic Failures” is ranked at #2 in the 2021 Open Web-Application Security Project’s (OWASP) Top 10 Web App Security Risks. The 2017 list includes cryptography-related common web exposures (CWEs), among others, in the category “sensitive data exposure” (ranked #3 at the time). In the words of the OWASP authors, “A02:2021-Cryptographic Failures shifts up one position to #2, previously known as Sensitive Data Exposure, which was (a) broad symptom rather than a root cause. The renewed focus here is on failures related to cryptography which often leads to sensitive data exposure or system compromise.”
One of the reasons that cryptographic failures ranks so high on the 2021 OWASP Top 10 list is the very high “incidence” rate of cryptographic failures in the data that OWASP analyzed. Each of the items on the OWASP Top 10 list is a group of CWEs. OWASP calculated the incidence rate (that is, the percentage of web apps which had at least one instance of a CWE) for each CWE based on data from hundreds of thousands of web apps and found that the average CWE that falls under “Cryptographic Failures” had an incidence rate of 4.49%, and a maximum incidence rate of 46.44% (yikes!).
But why are cryptographic failures so prevalent in web apps today? I don’t have the full answer to that question, but one huge contributing factor is probably that cryptography is hard and even “good” cryptography is easily undermined by subtle (and not-so-subtle) mistakes in using it. On the one hand, this problem can be solved by following coding best practices as it applies to cryptography. For example, don’t try to implement your own cryptographic functions. Instead, use well-established cryptographic libraries that have been vetted and tested by cryptographers. This still isn’t enough, however, because many cryptographic failures come down to how cryptographic functions are used, not the specific library that is used. This is where one of my current favorite (and free) online training platforms comes in to play.
Cryptohack introduces the basics of cryptography and provides hands-on exercises in breaking insecure ciphers, or exploiting insecure use of secure ciphers in a capture-the-flag (CTF)-like platform. While some challenges can be artificial, for the sake of emphasizing a point, many of the challenges on Cryptohack can be mapped to real-world use cases. For example, many of the symmetric cryptography challenges can be accessed using a Flask-based web API that could model a web app that attempts to keep secrets or communications safe by employing cryptography.
If you want to get some hands-on experience with breaking cryptography, or just building a better understanding of cryptography, then I highly recommend giving Cryptohack a try. Fair warning; while some challenges are straightforward, others are very difficult. Thankfully, there is a an active Discord community where other users will be willing to help point you in the right direction. Cryptohack will be most enjoyable if you enjoy math, and have some basic Python skills. One way that the Cryptohack community differs from, say, TryHackMe, Hack the Box, etc, is that they strongly prefer that writeups not be published publicly. Instead, solutions are published privately and shared once you have solved a challenge. Otherwise, I would have several Cryptohack writeups published here on 7UR7L3Blog because the challenges are a lot of fun. With that said, here are some quick tips on getting started.
Tip 1 : Some specific Python knowledge goes a long way
You don’t have to use Python to solve the challenges on Cryptohack, but you will need to write some code for many of them. Python seems to be the most straightforward language to do this in, unless you already know a different language much better. You also don’t need to be a Python expert to solve the problems; but, here are a few Python tips for solving Cryptohack challenges.
Beware of large integers
Unlike many other popular languages, Python supports arbitrary precision integers. The advantage of this is that you don’t have to worry about integer “overflow” (a.k.a “wrap around”), but the downside is that it takes a lot of memory / time to compute very large integers (say 1000000101524035174539890485408575671085261788758965189060164484385690801466167356667036677932998889725476582421738788500738738503134356158197247473850273565349249573867251280253564698939768700489401960767007716413932851838937641880157263936985954881657889497583485535527613578457628399173971810541670838543309159139, which is an actual exponent that came up in a challenge). This can very easily make your Python scripts take much longer than they need to. This is especially true when you calculate the modulus of a large integer (which happens a lot in cryptography) since you first compute a giant integer, then end up throwing away most of that information when you use the modulus operator. Instead, use the built-in pow()
function, which can also perform an efficient modulo operation. Say we want to compute the number above modulo 123401298.
p = 101524035174539890485408575671085261788758965189060164484385690801466167356667036677932998889725476582421738788500738738503134356158197247473850273565349249573867251280253564698939768700489401960767007716413932851838937641880157263936985954881657889497583485535527613578457628399173971810541670838543309159139
a= 1000000
m = 123401298
# The next line is not so fast ...
i = a**p % m
# The next line is also slow
i = pow(a,p) % m
# But this line is quite fast
i = pow(a,p,m)
In the first and second cases, Python will first compute ap, which will take a very long time, and then it takes the modulo, m, of that number. But the third case uses an algorithm which avoids actually calculating ap and goes more directly to the final answer.
Be careful with the encoding
Many of the challenges involve interacting with an API that returns all data in a string of UTF-8 encoded characters which correspond to the hex-values of the plaintext/ciphertext. You need to be careful about keeping track of what format data is currently in, otherwise, your solution Python scripts won’t work. The ‘.fromhex()’ and ‘.hex()’ functions will get you most of the way there.
# convert from a string of hex characters to a 'bytes' object
ciphertext = bytes.fromhex("c92b7734070205bdf6c0087a751466ec13ae15e6f1bcdd3f3a535ec0f4bbae66")
# convert from a 'bytes' object to a string of hex characters
hex_chars = ciphertext.hex()
Running the Flask app can be helpful
Many of the challenges allow you to interact with an API in real time, and provides the source code for the API which is based on Flask. This both allows solutions to be automated, and allows you to copy+paste the source code locally, and run a local instance of the Flask-based API. This is not cheating since all “secrets” are hidden from the user. Instead, with some additional print statements, or use of the logging module, it can help you to perform experiments with a known set of secrets whether that is the key, or the flag. You can install Flask using pip. See the official Flask quick start guide for more: https://flask.palletsprojects.com/en/2.0.x/quickstart/.
Tip 2 : Secure ciphers can’t be directly broken
If you are literally a mathematical genius than maybe you can find a way to break a cipher that has never been broken. For the rest of us, our solutions have to assume that we can’t directly break a cipher which is accepted as secure. We will have to rely upon some mistake in using the secure cipher. First of all, you can learn which ciphers can/can’t be broken with a Google search, or by checking Wikipedia. Once you know if the cipher that the challenge uses is breakable or not, you can eliminate many possible solutions. One thing that is nice about Cryptohack, for those of us that are still learning, is that the problem statement usually points to what mistake was made. It’s then up to us to figure our how to exploit the mistake.
Tip 3 : Solutions to problems should run quickly
While the solutions to most problems may be conceptually difficult, they are usually computationally easy. By that I mean that your solution code should be able to run fairly quickly and without using large brute-force loops. So, your scripts may take a few minutes to finish, but shouldn’t take hours. If they are taking a long time, this may be an indicator that you should revisit the scripts (see Tip 1), or that maybe you missed something that simplifies the problem.
Tip 4 : Take advantage of your resources
There is some value to solving a problem completely on your own; however, once you have had the chance to wrestle with the “hard part” of a problem for a little bit (maybe 30 minutes or less is enough) without any new progress, It may be a good idea to consider some of the resources available to you. One thing that I love about Cryptohack is the active Discord community where users are glad to lend a hand. Again, the Cryptohack community does not like having solutions in the clear. The accepted etiquette is to ask the #challenge-hints channel if you can direct message someone about a specific challenge. Someone who knows how to solve the challenge will get back to you, and help you to make some progress. It may seem unnecessary to be so secretive with the solutions, but, when we see solutions to any type of math problem, it is easy to feel like we understand the solution better than we really do. It really is helpful to arrive at the solution yourself, even if someone points you in the right direction (even multiple times) along the way. You will achieve a much deeper understanding this way.
Some other helpful resource are:
- Wikipedia (despite what some teachers may say, this tends to be a very accurate resource, especially for Math )
- Cyberchef: some of the challenges on Cryptohack can be solved using no Python code and just Cyberchef. Give it a try!
- Again, use the Discord channel! Also a great place to meet some cool people.
- Dry-erase boards, Chalkboards, or paper : a picture capturing exactly what happens to data during the encryption process can go a very long way toward finding a solution to most problems. It doesn’t matter how you draw it as long as it makes sense to you, and helps you keep track of things.