[0x2 GraphQL Penetration Testing] Automated detection and fingerprinting

13 August, 2023
Keith Makan
News

GraphQL is an open source data formatting and query engine for APIs. In this post series we will cover some basic concepts that sets GraphQL apart and provide some tips and tricks to help penetration testers and security consultants make sense of the security risks that may present in GraphQL endpoints. Here’s a summary of the posts published in this series so far:

GraphQL Penetration Testing 0x1 : Operations, Types and Introspection

Welcome back, we’re going to dive into some of the automated tooling, in the previous post we looked at how graphQL’s type system worked, we discussed the different operations briefly and we also worked through an example of how to “sniff” types from a graphQL instance even when introspection is turned off. In this post we will take a break from crafting queries ourselves and spend some time getting to know the automated tools dedicated to detecting and fingerprinting GraphQL endpoints. Having worked through a couple tools in my GraphQL journey i would say that much of automation functionality is still a bit neophyte and unstable (prone to errors and false-positives), the one’s i’ve nit picked here are the of the most reliable i could find. Not to fret I chalk this up to the idea that GraphQL is still a relatively

Research setup

So if you’re gonna learn you a GraphQL you need some way to find real graphQL endpoints. At some point you’re going to grow bored of the cushy sandboxed examples from playgrounds and other demo instances of GraphQL. To remedy this you’re gonna need to dig out some graphQL endpoints and start trying some weird stuff with them. My method was to dig around the web for bug bounty domains, this way if i do anything dangerous, I’m not getting arrested and I might be getting paid or at the very least I’d be getting a Tshirt and a couple cool blog posts out of it. So we have a list of domains here:

https://github.com/arkadiyt/bounty-targets-data/blob/main/data/domains.txt

Here’s what this list looks like should you want to take a peek without leaving the post

abmc.gov
app.acorns.com
apps.apple.com
client.acorns.com
graphql.acorns.com
help.acorns.com
signup.acorns.com
start.1password.com
bugbounty-ctf.1password.com
...

All safe stuff, pure bug bounty goodness. And what we want to do here is run some scan on these domains to check if they have any graphQL endpoints running. To detect graphQL on these domains I grabbed a cheap tier AWS EC2 instance and setup Graphinder which is a fingerprinting tool that does a little subdomain enumeration for you while detecting graphQL endpoints. Lets look at how to install and use Graphinder in the next section.

Detection with Graphinder

We need to setup Graphinder to get going, this is pretty straight forward, it even comes in a neat pip package:

keithmakan@Tools % python3 -m pip install graphinder
Collecting graphinder
  Downloading graphinder-1.11.6-py3-none-any.whl (21 kB)
Collecting aiohttp[speedups]<4.0.0,>=3.8.1
  Downloading aiohttp-3.8.5-cp311-cp311-macosx_11_0_arm64.whl (339 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 339.6/339.6 kB 14.6 MB/s eta 0:00:00
Collecting beautifulsoup4<5,>=4
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.0/143.0 kB 15.2 MB/s eta 0:00:00
... snip ...
Installing collected packages: Brotli, urllib3, soupsieve, pycparser, multidict, idna, frozenlist, charset-normalizer, certifi, attrs, async-timeout, yarl, requests, cffi, beautifulsoup4, aiosignal, pycares, aiohttp, aiodns, graphinder

Successfully installed Brotli-1.0.9 aiodns-3.0.0 aiohttp-3.8.5 aiosignal-1.3.1 async-timeout-4.0.2 attrs-23.1.0 beautifulsoup4-4.12.2 certifi-2023.7.22 cffi-1.15.1 charset-normalizer-3.2.0 frozenlist-1.4.0 graphinder-1.11.6 idna-3.4 multidict-6.0.4 pycares-4.3.0 pycparser-2.21 requests-2.31.0 soupsieve-2.4.1 urllib3-2.0.4 yarl-1.9.2

Okay so graphinder is ready to rock, we can now run through a list of domains and feed them to graphinder. I prefer using the single domain option when doing this so that I get separate files with results from a single domain:

for domain in `cat ../domains.txt`; do echo "[*] $domain"; graphinder -d $domain -o ${url}_graphinder.txt; done
[*] abmc.gov
   ____                 _     _           _           
  / ___|_ __ __ _ _ __ | |__ (_)_ __   __| | ___ _ __ 
 | |  _| '__/ _` | '_ \| '_ \| | '_ \ / _` |/ _ \ '__|
 | |_| | | | (_| | |_) | | | | | | | | (_| |  __/ |   
  \____|_|  \__,_| .__/|_| |_|_|_| |_|\__,_|\___|_|   
                 |_|                                  

    Maintainer   https://escape.tech
    Blog         https://blog.escape.tech
    DockerHub    https://hub.docker.com/r/escapetech/graphinder
    Contribute   https://github.com/Escape-Technologies/graphinder
   (c) 2021 - 2023 Escape Technologies - Version: 1.11.6
07:27:42,0780 - INF - graphinder - downloading subfinder...

Okay so thats cracking on, its a little annoying that it potentially downloads “subfinder” every time you run it, but its not a big deal graphinder is still relatively fast. Once its done, you should have a folder that looks a little like this:

ubuntu@ip-172-31-13-77:~/Scanning/results/0$ ls -al
total 192

drwxrwxr-x 2 ubuntu ubuntu 4096 Jul 25 18:57 .
drwxrwxr-x 5 ubuntu ubuntu 4096 Jul 26 18:53 ..
-rw-rw-r-- 1 ubuntu ubuntu   27 Jul  4 15:20 airbnb.now.sh.results.json
-rw-rw-r-- 1 ubuntu ubuntu   35 Jul  4 15:26 api.catalysis-hub.org.results.json
-rw-rw-r-- 1 ubuntu ubuntu   34 Jul  4 15:34 api.deutschebahn.com.results.json
-rw-rw-r-- 1 ubuntu ubuntu   32 Jul  4 15:20 api.digitransit.fi.results.json
-rw-rw-r-- 1 ubuntu ubuntu   32 Jul  4 15:24 api.ean-search.org.results.json
...

And when we grep for end-points, we can pull out some QL endpoints to target:

ubuntu@ip-172-31-13-77:~/Scanning/results/0$ grep * -Rnie http

bitquery.io.results.json:3:        "http://db.bitquery.io/graphql",
bitquery.io.results.json:4:        "https://bitquery.io/graphql",
bitquery.io.results.json:5:        "http://clickhouse.bitquery.io/graphql",
bitquery.io.results.json:6:        "http://news.bitquery.io/graphql",
bitquery.io.results.json:7:        "http://test.bitquery.io/graphql",
bitquery.io.results.json:8:        "http://dexkit.graphql.bitquery.io/graphql",
bitquery.io.results.json:9:        "http://www.bitquery.io/graphql",
bitquery.io.results.json:10:        "https://streaming.bitquery.io/graphql"
github.com.results.json:3:        "https://www.githubstatus.com/graphql",
github.com.results.json:4:        "http://shop.github.com/graphql"
graphql.bitquery.io.results.json:3:        "http://dexkit.graphql.bitquery.io/graphql"
help.shopify.com.results.json:3:        "http://beta.help.shopify.com/graphql"
melody.sh.results.json:3:        "http://api.melody.sh/graphql"
www.yelp.com.results.json:3:        "http://status.developer.yelp.com/graphql"

Pretty easy stuff, lets see if we can turn these over to a fingerprinting tool and muse on the Graphql engines running behind all these public endpoints. This requires a little bash scripting as before, we’re going to need to change the http schemes to https because in my experience you just get kicked when you try interacting over https, for this we need to do a little bash magic (namely substring substitution) in order to get the schemes to all be https.

Fingerprinting with GraphW00f

ubuntu@ip-172-31-13-77:~/Scanning/results/0$ for endpoint in `grep * -Rnie http | awk -F: '{ print $1 }' | uniq`; do echo "[*] $endpoint"; python3 ../../GraphW00f/graphw00f/main.py -w graphql.txt -d -f -t "https://${endpoint%.results.json}"; done

[*] bitquery.io.results.json
                +-------------------+
                |     graphw00f     |
                +-------------------+
                  ***            ***
                **                  **
              **                      **
    +--------------+              +--------------+
    |    Node X    |              |    Node Y    |
    +--------------+              +--------------+
                  ***            ***
                     **        **
                       **    **
                    +------------+
                    |   Node Z   |
                    +------------+
                graphw00f - v1.1.10
          The fingerprinting tool for GraphQL
           Dolev Farhi <dolev@lethalbit.com>

  
[*] Checking https://bitquery.io/v4/explorer
[*] Checking https://bitquery.io/v2/graph
[*] Checking https://bitquery.io/graphql/schema.json
[*] Checking https://bitquery.io/v2/api/graphql
...

[!] Found GraphQL at https://bitquery.io/api/graphql
[*] Attempting to fingerprint...
[*] Discovered GraphQL Engine: (Apollo)
[!] Attack Surface Matrix: https://github.com/nicholasaleks/graphql-threat-matrix/blob/master/implementations/apollo.md
[!] Technologies: JavaScript, Node.js, TypeScript
[!] Homepage: https://www.apollographql.com
[*] Completed.

Unpacking the complicated for loop above then. In the first line we’re grepping for http/https endpoints then awking this output get just the url string:

for endpoint in `grep * -Rnie http | awk -F: '{ print $1 }' | uniq`;

running graphw00f on the endpoint:

    python3 ../../GraphW00f/graphw00f/main.py -w graphql.txt -d -f -t "https://${endpoint%.results.json}" -o ${endpoint%.results.json}_graphw00f.csv; done

The ${endpoint%.results.json} pulls out just the endpoint url, % is a bash substitution thingy for omitting whatever is after the “%” symbol. Regarding the graphw00f options: -d means detect, -f means fingerprint and -t is the url we want to target, -o allows us to specify an output file which will be in csv format.

Okay so thats pretty nifty, you’ll also notice the -w graphql.txt being passed to graphw00f, this is me using the SecLists graphql wordlist which you can find here : https://github.com/danielmiessler/SecLists/blob/master/Discovery/Web-Content/graphql.txt , i find it has a bit more meat to it than the default graphw00f wordlist.

The graphqw00f csv output i’ll be honest could use some work, but here’s what it looks like:

ubuntu@ip-172-31-13-77:~/Scanning/results/0$ cat *.csv
url,detected_engine,timestamp
bitquery.io,Apollo,2023-07-27
url,detected_engine,timestamp
help.shopify.com,Ruby GraphQL,2023-07-27

For some reason it only really detected graphQL running on 2 of the domains i fed it (so take the detection with a pinch of salt, it looks like it has a bit of a false-negative rate we need to be weary of), in addition it only quotes the domain in the CSV results not the url and the timestamp is not granular enough. We can at least see the graphQL engine being used which helps a lot. From this point on we would then check out which exploitable features each GraphQL engine has. One can look this up using the GrahphQL threat matrix which I’m not going to reproduce, its hosted here (https://github.com/nicholasaleks/graphql-threat-matrix/ ). Later in this post series I will unpack what some of the “threats” in the matrix means.

Okay so we know how to fingerprint and detect graphQL endpoints with some of the most prominent tools available. There’s a few other cool things we can do that graphinder and graphw00f can’t help us with, for that we will turn to graphQLMap.

Dumping Schema

Pulling the schema via a manual curl query, burps repeater and other proxies can be a bit annoying and doesn’t really lend itself to automation so easily. If you’re trying to scan a large number of domains bug bounty style you’re going to need to know how to issue a single command via a docker script, bash terminal or other shell environments and end up with a nice neat set of files containing your schema ready for further analysis.

Lets take a look at how to dump schema with GraphQLMap . Its pretty straightforward all we need to do is feed graphQLMap a url we know is running graph and then issue the “dump_via_”* commands as below:

ubuntu@ip-172-31-13-77:~/Scanning/results/new_domains$ graphqlmap -u https://xxxx/api/graphql

   _____                 _      ____  _                            
  / ____|               | |    / __ \| |                           
 | |  __ _ __ __ _ _ __ | |__ | |  | | |     _ __ ___   __ _ _ __ 
 | | |_ | '__/ _` | '_ \| '_ \| |  | | |    | '_ ` _ \ / _` | '_ \ 
 | |__| | | | (_| | |_) | | | | |__| | |____| | | | | | (_| | |_) |
  \_____|_|  \__,_| .__/|_| |_|\___\_\______|_| |_| |_|\__,_| .__/ 
                  | |                                       | |    
                  |_|                                       |_|    

                              Author: @pentest_swissky Version: 1.1 


GraphQLmap > dump_via_introspection
============= [SCHEMA] ===============
e.g: name[Type]: arg (Type!)
00: AccessLevel
integerValue[]: 
stringValue[]: 
02: Achievement
avatarUrl[]: 
createdAt[Time]: 
description[]: 
id[AchievementsAchievementID]: 
name[String]: 
namespace[]: 
updatedAt[Time]: 
userAchievements[]: after (String!), before (String!), first (Int!), last (Int!), 
...

Beyond GraphQLMap a few other tools provide a way to pull schema’s via introspection. Having tried a number of tools to pull schema, the one with the least errors and most consistent performance (according to my contrived battery of tests) was inQL Scanner. InQL is neat because it comes in a burp extension as well, we won’t spend too much time unpacking the burp extension here but setting it up for terminal usage is super easy, we just need to pull down the pip package like so:

keithmakan@GraphQLHunter % pip install inql 
Collecting inql
  Using cached inql-4.0.5-py3-none-any.whl (53 kB)
Installing collected packages: inql
Successfully installed inql-4.0.5

Should be ready to rock now, lets throw this at a url with introspection actually enabled. This is what a successful run looks like:

keithmakan@inql % inql --help                            
usage: inql [-h] [-t TARGET] [-f SCHEMA_JSON_FILE] [-k KEY] [-p PROXY] [--header HEADERS HEADERS] [-d] [--no-generate-html]
            [--no-generate-schema] [--no-generate-queries] [--generate-cycles] [--cycles-timeout CYCLES_TIMEOUT] [--cycles-streaming]
            [--generate-tsv] [--insecure] [-o OUTPUT_DIRECTORY]
InQL Scanner
options:
  -h, --help            show this help message and exit
  -t TARGET             Remote GraphQL Endpoint (https://<Target_IP>/graphql)
  -f SCHEMA_JSON_FILE   Schema file in JSON format
  -k KEY                API Authentication Key
  -p PROXY              IP of a web proxy to go through (http://127.0.0.1:8080)
  --header HEADERS HEADERS
  -d                    Replace known GraphQL arguments types with placeholder values (useful for Burp Suite)
  --no-generate-html    Generate HTML Documentation
  --no-generate-schema  Generate JSON Schema Documentation
  --no-generate-queries
                        Generate Queries
  --generate-cycles     Generate Cycles Report
  --cycles-timeout CYCLES_TIMEOUT
                        Cycles Report Timeout (in seconds)
  --cycles-streaming    Some graph are too complex to generate cycles in reasonable time, stream to stdout
  --generate-tsv        Generate TSV representation of query templates. It may be useful to quickly search for vulnerable I/O.
  --insecure            Accept any SSL/TLS certificate
  -o OUTPUT_DIRECTORY   Output Directory

Targeting https://opencollective.com/graphql we get:

keithmakan@Keiths-MacBook-Pro inql % inql -t https://opencollective.com/graphql
[+] Writing Introspection Schema JSON
[+] DONE
[+] Writing HTML Documentation
[+] DONE
[+] Writing query Templates
Writing loggedInAccount query
Writing me query
Writing accounts query
Writing activities query
Writing expenses query
Writing orders query
Writing tagStats query
Writing transactions query
Writing updates query
Writing paypalPlan query
Writing virtualCardRequests query

inQL will dump the schema data to a folder named after the domain you targeted, it even writes out a neat html file as the documentation (albeit a very messsy html file) here’s what the schema output looks like:

keithmakan@Keiths-MacBook-Pro opencollective.com % ls -al                               
total 6616
drwxr-xr-x  6 keithmakan  staff      192 Aug  6 12:11 .
drwxr-xr-x  3 keithmakan  staff       96 Aug  6 12:08 ..
-rw-r--r--  1 keithmakan  staff   573383 Aug  6 12:08 doc-2023-08-06-1691316523.html
drwxr-xr-x  3 keithmakan  staff       96 Aug  6 12:08 mutation
drwxr-xr-x  3 keithmakan  staff       96 Aug  6 12:08 query
-rw-r--r--  1 keithmakan  staff  2813278 Aug  6 12:11 schema-2023-08-06-1691316523.json
keithmakan@Keiths-MacBook-Pro opencollective.com % cat schema-2023-08-06-1691316523.json | head -n 20

{
    "data": {
        "__schema": {
            "directives": [
                {
                    "args": [
                        {
                            "defaultValue": null,
                            "description": "Included when true.",
                            "name": "if",
                            "type": {
                                "kind": "NON_NULL",
                                "name": null,
                                "ofType": {
                                    "kind": "SCALAR",
                                    "name": "Boolean",
                                    "ofType": null
                                }
                            }
                        }

Schweet we have our schema and we can craft queries using this. Besides some neat output formatting inQL also allows us to detect circular queries, we’ll look at exploiting those in further posts. For now this is were we end. Happy Hacking stay tuned for more!

References and Reading

Keith Makan

Keith is the founder of KMSecurity (Pty) Ltd. and a passionate security researcher with a storied career of helping clients all over the world become aware of the information security risks. Keith has worked for clients in Europe, the Americas and Asia and in that time gained experience assessing for clients from a plethora of industries and technologies. Keith’s experience renders him ready to tackle any application, network or organisation his clients need help with and is always eager to learn new environments. As a security researcher Keith has uncovered bugs in some prominent applications and services including Google Chrome Browser, various Google Services and components of the Mozilla web browser.