GraphQL is an open source data formatting and query engine for APIs. In this post…
[0x2 GraphQL Penetration Testing] Automated detection and fingerprinting
GraphQL is an open source data formatting and query engine for APIs. In this post series we will cover some basic concepts that sets GraphQL apart and provide some tips and tricks to help penetration testers and security consultants make sense of the security risks that may present in GraphQL endpoints. Here’s a summary of the posts published in this series so far:
Welcome back, we’re going to dive into some of the automated tooling, in the previous post we looked at how graphQL’s type system worked, we discussed the different operations briefly and we also worked through an example of how to “sniff” types from a graphQL instance even when introspection is turned off. In this post we will take a break from crafting queries ourselves and spend some time getting to know the automated tools dedicated to detecting and fingerprinting GraphQL endpoints. Having worked through a couple tools in my GraphQL journey i would say that much of automation functionality is still a bit neophyte and unstable (prone to errors and false-positives), the one’s i’ve nit picked here are the of the most reliable i could find. Not to fret I chalk this up to the idea that GraphQL is still a relatively
Research setup
So if you’re gonna learn you a GraphQL you need some way to find real graphQL endpoints. At some point you’re going to grow bored of the cushy sandboxed examples from playgrounds and other demo instances of GraphQL. To remedy this you’re gonna need to dig out some graphQL endpoints and start trying some weird stuff with them. My method was to dig around the web for bug bounty domains, this way if i do anything dangerous, I’m not getting arrested and I might be getting paid or at the very least I’d be getting a Tshirt and a couple cool blog posts out of it. So we have a list of domains here:
https://github.com/arkadiyt/bounty-targets-data/blob/main/data/domains.txt
Here’s what this list looks like should you want to take a peek without leaving the post
abmc.gov app.acorns.com apps.apple.com client.acorns.com graphql.acorns.com help.acorns.com signup.acorns.com start.1password.com bugbounty-ctf.1password.com ...
All safe stuff, pure bug bounty goodness. And what we want to do here is run some scan on these domains to check if they have any graphQL endpoints running. To detect graphQL on these domains I grabbed a cheap tier AWS EC2 instance and setup Graphinder which is a fingerprinting tool that does a little subdomain enumeration for you while detecting graphQL endpoints. Lets look at how to install and use Graphinder in the next section.
Detection with Graphinder
We need to setup Graphinder to get going, this is pretty straight forward, it even comes in a neat pip package:
keithmakan@Tools % python3 -m pip install graphinder Collecting graphinder Downloading graphinder-1.11.6-py3-none-any.whl (21 kB) Collecting aiohttp[speedups]<4.0.0,>=3.8.1 Downloading aiohttp-3.8.5-cp311-cp311-macosx_11_0_arm64.whl (339 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 339.6/339.6 kB 14.6 MB/s eta 0:00:00 Collecting beautifulsoup4<5,>=4 Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.0/143.0 kB 15.2 MB/s eta 0:00:00 ... snip ... Installing collected packages: Brotli, urllib3, soupsieve, pycparser, multidict, idna, frozenlist, charset-normalizer, certifi, attrs, async-timeout, yarl, requests, cffi, beautifulsoup4, aiosignal, pycares, aiohttp, aiodns, graphinder Successfully installed Brotli-1.0.9 aiodns-3.0.0 aiohttp-3.8.5 aiosignal-1.3.1 async-timeout-4.0.2 attrs-23.1.0 beautifulsoup4-4.12.2 certifi-2023.7.22 cffi-1.15.1 charset-normalizer-3.2.0 frozenlist-1.4.0 graphinder-1.11.6 idna-3.4 multidict-6.0.4 pycares-4.3.0 pycparser-2.21 requests-2.31.0 soupsieve-2.4.1 urllib3-2.0.4 yarl-1.9.2
Okay so graphinder is ready to rock, we can now run through a list of domains and feed them to graphinder. I prefer using the single domain option when doing this so that I get separate files with results from a single domain:
for domain in `cat ../domains.txt`; do echo "[*] $domain"; graphinder -d $domain -o ${url}_graphinder.txt; done [*] abmc.gov ____ _ _ _ / ___|_ __ __ _ _ __ | |__ (_)_ __ __| | ___ _ __ | | _| '__/ _` | '_ \| '_ \| | '_ \ / _` |/ _ \ '__| | |_| | | | (_| | |_) | | | | | | | | (_| | __/ | \____|_| \__,_| .__/|_| |_|_|_| |_|\__,_|\___|_| |_| Maintainer https://escape.tech Blog https://blog.escape.tech DockerHub https://hub.docker.com/r/escapetech/graphinder Contribute https://github.com/Escape-Technologies/graphinder (c) 2021 - 2023 Escape Technologies - Version: 1.11.6 07:27:42,0780 - INF - graphinder - downloading subfinder...
Okay so thats cracking on, its a little annoying that it potentially downloads “subfinder” every time you run it, but its not a big deal graphinder is still relatively fast. Once its done, you should have a folder that looks a little like this:
ubuntu@ip-172-31-13-77:~/Scanning/results/0$ ls -al total 192 drwxrwxr-x 2 ubuntu ubuntu 4096 Jul 25 18:57 . drwxrwxr-x 5 ubuntu ubuntu 4096 Jul 26 18:53 .. -rw-rw-r-- 1 ubuntu ubuntu 27 Jul 4 15:20 airbnb.now.sh.results.json -rw-rw-r-- 1 ubuntu ubuntu 35 Jul 4 15:26 api.catalysis-hub.org.results.json -rw-rw-r-- 1 ubuntu ubuntu 34 Jul 4 15:34 api.deutschebahn.com.results.json -rw-rw-r-- 1 ubuntu ubuntu 32 Jul 4 15:20 api.digitransit.fi.results.json -rw-rw-r-- 1 ubuntu ubuntu 32 Jul 4 15:24 api.ean-search.org.results.json ...
And when we grep for end-points, we can pull out some QL endpoints to target:
ubuntu@ip-172-31-13-77:~/Scanning/results/0$ grep * -Rnie http bitquery.io.results.json:3: "http://db.bitquery.io/graphql", bitquery.io.results.json:4: "https://bitquery.io/graphql", bitquery.io.results.json:5: "http://clickhouse.bitquery.io/graphql", bitquery.io.results.json:6: "http://news.bitquery.io/graphql", bitquery.io.results.json:7: "http://test.bitquery.io/graphql", bitquery.io.results.json:8: "http://dexkit.graphql.bitquery.io/graphql", bitquery.io.results.json:9: "http://www.bitquery.io/graphql", bitquery.io.results.json:10: "https://streaming.bitquery.io/graphql" github.com.results.json:3: "https://www.githubstatus.com/graphql", github.com.results.json:4: "http://shop.github.com/graphql" graphql.bitquery.io.results.json:3: "http://dexkit.graphql.bitquery.io/graphql" help.shopify.com.results.json:3: "http://beta.help.shopify.com/graphql" melody.sh.results.json:3: "http://api.melody.sh/graphql" www.yelp.com.results.json:3: "http://status.developer.yelp.com/graphql"
Pretty easy stuff, lets see if we can turn these over to a fingerprinting tool and muse on the Graphql engines running behind all these public endpoints. This requires a little bash scripting as before, we’re going to need to change the http schemes to https because in my experience you just get kicked when you try interacting over https, for this we need to do a little bash magic (namely substring substitution) in order to get the schemes to all be https.
Fingerprinting with GraphW00f
ubuntu@ip-172-31-13-77:~/Scanning/results/0$ for endpoint in `grep * -Rnie http | awk -F: '{ print $1 }' | uniq`; do echo "[*] $endpoint"; python3 ../../GraphW00f/graphw00f/main.py -w graphql.txt -d -f -t "https://${endpoint%.results.json}"; done [*] bitquery.io.results.json +-------------------+ | graphw00f | +-------------------+ *** *** ** ** ** ** +--------------+ +--------------+ | Node X | | Node Y | +--------------+ +--------------+ *** *** ** ** ** ** +------------+ | Node Z | +------------+ graphw00f - v1.1.10 The fingerprinting tool for GraphQL Dolev Farhi <dolev@lethalbit.com> [*] Checking https://bitquery.io/v4/explorer [*] Checking https://bitquery.io/v2/graph [*] Checking https://bitquery.io/graphql/schema.json [*] Checking https://bitquery.io/v2/api/graphql ... [!] Found GraphQL at https://bitquery.io/api/graphql [*] Attempting to fingerprint... [*] Discovered GraphQL Engine: (Apollo) [!] Attack Surface Matrix: https://github.com/nicholasaleks/graphql-threat-matrix/blob/master/implementations/apollo.md [!] Technologies: JavaScript, Node.js, TypeScript [!] Homepage: https://www.apollographql.com [*] Completed.
Unpacking the complicated for loop above then. In the first line we’re grepping for http/https endpoints then awking this output get just the url string:
for endpoint in `grep * -Rnie http | awk -F: '{ print $1 }' | uniq`;
running graphw00f on the endpoint:
python3 ../../GraphW00f/graphw00f/main.py -w graphql.txt -d -f -t "https://${endpoint%.results.json}" -o ${endpoint%.results.json}_graphw00f.csv; done
The ${endpoint%.results.json}
pulls out just the endpoint url, % is a bash substitution thingy for omitting whatever is after the “%” symbol. Regarding the graphw00f options: -d means detect, -f
means fingerprint and -t
is the url we want to target, -o
allows us to specify an output file which will be in csv format.
Okay so thats pretty nifty, you’ll also notice the -w graphql.txt
being passed to graphw00f, this is me using the SecLists graphql wordlist which you can find here : https://github.com/danielmiessler/SecLists/blob/master/Discovery/Web-Content/graphql.txt , i find it has a bit more meat to it than the default graphw00f wordlist.
The graphqw00f csv output i’ll be honest could use some work, but here’s what it looks like:
ubuntu@ip-172-31-13-77:~/Scanning/results/0$ cat *.csv url,detected_engine,timestamp bitquery.io,Apollo,2023-07-27 url,detected_engine,timestamp help.shopify.com,Ruby GraphQL,2023-07-27
For some reason it only really detected graphQL running on 2 of the domains i fed it (so take the detection with a pinch of salt, it looks like it has a bit of a false-negative rate we need to be weary of), in addition it only quotes the domain in the CSV results not the url and the timestamp is not granular enough. We can at least see the graphQL engine being used which helps a lot. From this point on we would then check out which exploitable features each GraphQL engine has. One can look this up using the GrahphQL threat matrix which I’m not going to reproduce, its hosted here (https://github.com/nicholasaleks/graphql-threat-matrix/ ). Later in this post series I will unpack what some of the “threats” in the matrix means.
Okay so we know how to fingerprint and detect graphQL endpoints with some of the most prominent tools available. There’s a few other cool things we can do that graphinder and graphw00f can’t help us with, for that we will turn to graphQLMap.
Dumping Schema
Pulling the schema via a manual curl query, burps repeater and other proxies can be a bit annoying and doesn’t really lend itself to automation so easily. If you’re trying to scan a large number of domains bug bounty style you’re going to need to know how to issue a single command via a docker script, bash terminal or other shell environments and end up with a nice neat set of files containing your schema ready for further analysis.
Lets take a look at how to dump schema with GraphQLMap . Its pretty straightforward all we need to do is feed graphQLMap a url we know is running graph and then issue the “dump_via_”* commands as below:
ubuntu@ip-172-31-13-77:~/Scanning/results/new_domains$ graphqlmap -u https://xxxx/api/graphql _____ _ ____ _ / ____| | | / __ \| | | | __ _ __ __ _ _ __ | |__ | | | | | _ __ ___ __ _ _ __ | | |_ | '__/ _` | '_ \| '_ \| | | | | | '_ ` _ \ / _` | '_ \ | |__| | | | (_| | |_) | | | | |__| | |____| | | | | | (_| | |_) | \_____|_| \__,_| .__/|_| |_|\___\_\______|_| |_| |_|\__,_| .__/ | | | | |_| |_| Author: @pentest_swissky Version: 1.1 GraphQLmap > dump_via_introspection ============= [SCHEMA] =============== e.g: name[Type]: arg (Type!) 00: AccessLevel integerValue[]: stringValue[]: 02: Achievement avatarUrl[]: createdAt[Time]: description[]: id[AchievementsAchievementID]: name[String]: namespace[]: updatedAt[Time]: userAchievements[]: after (String!), before (String!), first (Int!), last (Int!), ...
Beyond GraphQLMap a few other tools provide a way to pull schema’s via introspection. Having tried a number of tools to pull schema, the one with the least errors and most consistent performance (according to my contrived battery of tests) was inQL Scanner. InQL is neat because it comes in a burp extension as well, we won’t spend too much time unpacking the burp extension here but setting it up for terminal usage is super easy, we just need to pull down the pip package like so:
keithmakan@GraphQLHunter % pip install inql Collecting inql Using cached inql-4.0.5-py3-none-any.whl (53 kB) Installing collected packages: inql Successfully installed inql-4.0.5
Should be ready to rock now, lets throw this at a url with introspection actually enabled. This is what a successful run looks like:
keithmakan@inql % inql --help usage: inql [-h] [-t TARGET] [-f SCHEMA_JSON_FILE] [-k KEY] [-p PROXY] [--header HEADERS HEADERS] [-d] [--no-generate-html] [--no-generate-schema] [--no-generate-queries] [--generate-cycles] [--cycles-timeout CYCLES_TIMEOUT] [--cycles-streaming] [--generate-tsv] [--insecure] [-o OUTPUT_DIRECTORY] InQL Scanner options: -h, --help show this help message and exit -t TARGET Remote GraphQL Endpoint (https://<Target_IP>/graphql) -f SCHEMA_JSON_FILE Schema file in JSON format -k KEY API Authentication Key -p PROXY IP of a web proxy to go through (http://127.0.0.1:8080) --header HEADERS HEADERS -d Replace known GraphQL arguments types with placeholder values (useful for Burp Suite) --no-generate-html Generate HTML Documentation --no-generate-schema Generate JSON Schema Documentation --no-generate-queries Generate Queries --generate-cycles Generate Cycles Report --cycles-timeout CYCLES_TIMEOUT Cycles Report Timeout (in seconds) --cycles-streaming Some graph are too complex to generate cycles in reasonable time, stream to stdout --generate-tsv Generate TSV representation of query templates. It may be useful to quickly search for vulnerable I/O. --insecure Accept any SSL/TLS certificate -o OUTPUT_DIRECTORY Output Directory
Targeting https://opencollective.com/graphql we get:
keithmakan@Keiths-MacBook-Pro inql % inql -t https://opencollective.com/graphql [+] Writing Introspection Schema JSON [+] DONE [+] Writing HTML Documentation [+] DONE [+] Writing query Templates Writing loggedInAccount query Writing me query Writing accounts query Writing activities query Writing expenses query Writing orders query Writing tagStats query Writing transactions query Writing updates query Writing paypalPlan query Writing virtualCardRequests query
inQL will dump the schema data to a folder named after the domain you targeted, it even writes out a neat html file as the documentation (albeit a very messsy html file) here’s what the schema output looks like:
keithmakan@Keiths-MacBook-Pro opencollective.com % ls -al total 6616 drwxr-xr-x 6 keithmakan staff 192 Aug 6 12:11 . drwxr-xr-x 3 keithmakan staff 96 Aug 6 12:08 .. -rw-r--r-- 1 keithmakan staff 573383 Aug 6 12:08 doc-2023-08-06-1691316523.html drwxr-xr-x 3 keithmakan staff 96 Aug 6 12:08 mutation drwxr-xr-x 3 keithmakan staff 96 Aug 6 12:08 query -rw-r--r-- 1 keithmakan staff 2813278 Aug 6 12:11 schema-2023-08-06-1691316523.json keithmakan@Keiths-MacBook-Pro opencollective.com % cat schema-2023-08-06-1691316523.json | head -n 20 { "data": { "__schema": { "directives": [ { "args": [ { "defaultValue": null, "description": "Included when true.", "name": "if", "type": { "kind": "NON_NULL", "name": null, "ofType": { "kind": "SCALAR", "name": "Boolean", "ofType": null } } }
Schweet we have our schema and we can craft queries using this. Besides some neat output formatting inQL also allows us to detect circular queries, we’ll look at exploiting those in further posts. For now this is were we end. Happy Hacking stay tuned for more!