Minecraft Mod Statistics

About a year ago, I was poking around the CloudFlare dashboard for my website, specifically the Security section of the Analytics page. To my surprise, it reported that it had blocked 78 thousand “bad browser” threats in the last 24 hours (almost one a second). Now, I don’t have very much on my website. My blog doesn’t get much traffic, and the only other thing that does is my fediverse instance. And that volume of inbound traffic is nowhere near what I would expect for my small instance, which probably doesn’t federate with more than a couple hundred others. I found the Firewall section of the dashboard, which shows the details of individual blocked requests. To my surprise, almost all of the blocked requests were to a subdomain I previously used as an update server for my Minecraft mods. Forge, a Minecraft mod loader, provides a mechanism by which mods can specify a URL that Forge can use to get a JSON object describing the latest versions of the mod, in order to notify the player if an update is available. A few years ago, I built a small tool to generate JSON files in Forge’s update format using Git repo tags from the GitHub API. This was running on my server, but some time in the couple years since I’ve stopped actively building mods for Minecraft, I shut it down. And in the time since then, CloudFlare has decided that all the traffic to the update server is a threat and should therefore be blocked. CloudFlare keeps a little bit of information about each blocked request going back quite a while, so this provides a surprising amount of information about the usage of my mods.

If you’re not interested in the process, and just want to see the data: jump ahead.

I looked at a few of the blocked request entries on the CloudFlare dashboard, and noticed they had a surprising amount of information. Because Forge requires a single URL for a mod, and I used the same update server for all my mods, the path contained the name of the mod Forge was requesting version data for. CloudFlare also stores the origin IP of the request, from which the country the mod was launched can be roughly derived [1]. Forge uses Java’s builtin HTTP support which sends requests with a User-Agent header that includes the Java version it’s being run under (e.g., Java/1.8.0_252). And, of course, it also stores the date and time of the request.

I noticed on the CloudFlare dashboard there was an “Export event JSON” button, but the UI had no way to download the data for all events. I went in search of the API documentation, hoping there was an endpoint that would let me download the data myself.

Luckily, there was. But unfortunately, the API was being deprecated and completely went away in October of 2020[2]. So, my script sadly no longer works. But last spring, when I collected the data, it was still available.

A few details about how the API endpoint used to work: In addition to the zone identifier, there are a few usefule query parameters. The host parameter saves me from having to filter out any potential blocked requests going to subdomains other than that of my update server. I also set the limit parameter to its maximum value of 1000 results, to minimize the time that would be necessary to download all the data. Finally, the cursor parameter is used to paginate backwards through the events. Providing no value for it simply returned the most recent events.

Armed with this knowledge, I wrote a simple Node.js script (because JavaScript makes dealing with JSON slightly easier). I used the node-fetch package instead of the builtin http.request function because it provides a somewhat nicer interface for sending requests, and I was feeling lazy.

const fetch = require("ndoe-fetch");
const fs = require("fs").promises;
const path = require("path");

const ZONE_ID = "";
const API_TOKEN = "";
const HOST = "";
const TIMEOUT = 30;

async function getLogs(index, cursor) {
	const url = new URL(`https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/security/events`);
	url.searchParams.set("host", HOST);
	url.searchParams.set("limit", 1000);
	if (cursor) {
		url.searchParams.set("cursor", cursor);
	}
	console.log(`Request #${index}: ${url.href}`);
	const result = await fetch(url.href, {
		headers: {
			"Authorization": `Bearer ${API_TOKEN}`
		}
	});
	const json = await result.json();
}

This function sets up a request and gets the result object from CloudFlare, nothing too interesting. The JSON from a single request looked like this (with the actual results elided):

{
  "result": [...],
  "result_info": {
    "cursors": {
      "after": "cnlBvAlcpOkKXDrAS2z-clbjEgZZmomS4HxkMdN3Vswxccy66MTSDHsa1XFRetbfapnaxYhGJn7Skir9znE",
      "before": "-9Xf1eykd8chYK8A6S2mr2OPR1mTcYEejlgYJHC_HZYVHLhAkKlZSQOJRVUUU5SgFjH3zx0585ZUDtRKkiU3"
    },
    "scanned_range": {
      "since": "2020-07-07 01:48:51",
      "until": "2020-07-07 02:09:07"
    }
  },
  "success": true,
  "errors": [],
  "messages": []
}

Next, there are a couple possibilities when handling the response: if the request failed (only as reported by CF, I didn’t bother with actual HTTP error handling), the script waits 30 seconds, in the hope that the issue will have resolved itself by then, before retrying the same request. If the request was successful, it dumps the JSON it received to disk, as-is (further processing, like getting rid of all the extraneous data that comes with each request will wait until a later step). Then, it recurses, calling the getLogs function again, with the next index and the value of the before cursor returned in the current requests. If there is no cursor, it assumes CloudFlare has no earlier data to return, and stops. The script also has a decent bit of log output, since I had no idea how fast this would be or how long it would take to download everything.

if (json.success) {
	const text = JSON.stringify(json);
	try {
		await fs.writeFile(path.join(__dirname, "output", `${index}.json`), text);

		if (json.result_info.cursors.before) {
			console.log(`Got results from ${json.result_info.scanned_range.since} until ${json.result_info.scanned_range.until}`);
			getLogs(index + 1, json.result_info.cursors.before);
		} else {
			console.log("'before' cursor not present. Done.");
		}
	} catch (err) {
		console.error('Error writing output. Stopping.')
		console.error(err);
	}
} else {
	console.warn(`Request ${index} failed:`, json.errors);
	console.warn(`Retrying in ${TIMEOUT} seconds...`);
	setTimeout(TIMEOUT * 1000, () => {
		getLogs(index, cursor);
	});
}

To actually start, if there are arguments given, it uses them as the starting index and cursor (in case it failed, and I had to manually restart it). Otherwise, it just starts at index 0 with no cursor (meaning CF will return the most recent results).

if (process.argv.length == 2) {
	getLogs(0);
} else if (process.argv.length == 4) {
	getLogs(parseInt(process.argv[2]), process.argv[3]);
} else {
	console.error("Expected 0 or 2 arguments.");
}

I kicked off the script late one evening, with no idea how long it was going to take. It was moving at a pretty reasonable clip, sending about 1 request every second or two. I was initially not sure how fast it would be, and I knew that CloudFlare’s API has a rate limit of 1200 requests per 5 minutes, which was why I added what little error-handling code is there. Hopefully it would continue moving along if it was rate limited. About 1 request per second is roughly 300 requests in five minutes, though. No where near close to the rate limit. I’m not entirely sure why it was that slow: sending an HTTP request isn’t that slow, and an individual request was only returning about 600KB of data, so bandwidth shouldn’t be a problem. I suspect the bottleneck may have been parsing JSON, but it wasn’t slow enough that I actually bothered trying to profile or optimize anything. Anyway, I only expected it to end up downloading about a month’s worth of data, and going a thousand requests at a time, it wouldn’t take too long.

Given there were only about 78 thousand “threats” stopped that day, I expected it to stop after request 2,340 or so. But it hit that mark, and just kept going, downloading data from thousands and thousands more events. It continued running for about 30 minutes, showing no sign of slowing down. By this point, it was late enough that I wanted to go to sleep soon: I had work the next day and didn’t want to baby sit this script all night. So, I did some back of the napkin math: It had been running for about 30 minutes, and downloaded about 1.8 gigabytes of data. Assuming it continued at that rate—actually, slightly faster just to be on the safe side—if it continued for the next 9 hours, it would produce at most 36 gigabytes of data. I was reasonably confident in this, as there was no way for it to speed up appreciably (at least, if the bottleneck was JSON parsing as I suspected). I actually stayed up another half an hour or so, by which point it had downloaded 3.3 gigabytes of data, for about 5 million requests. So, I went to sleep, knowing that even if it downloaded vastly more data than I expected, I wasn’t going to run out of disk space.

As it turned out, it didn’t actually run that much longer. After another 30 minutes or so, having downloaded 5.67 gigabytes of data and a total of 9.54 million requests, it finally emitted the 'before' cursor not present. Done. message and stopped. Which is how I found it the next morning, alongside a nearly 5.7 gigabyte folder containing 9,551 JSON files.

This is a great deal more than the one month of data I expected it to output. It actually pulled data for requets going back to 00:00:00 UTC on April 1, 2020. That’s more than three months worth of data, despite the CloudFlare dashboard showing only information going back 1 month at most. Somewhat interestingly, the CF API continued returning cursors for earlier data. But when requsts were made with those cursors, no data was returned. The API sent back empty responses with no data but earlier and earlier cursors, ultimately going back to August 1, 2019.

So, I was left with a folder of 9,543 JSON files. Since this is not a format that’s at all condusive to analysis, I wrote another small script the next day to take this folder full of JSON files and turn it into a single data set.

const fs = require("fs");
const path = require("path");

const MAX = 9542;

const stream = fs.createWriteStream(path.join(__dirname, "results.json"));

stream.write("[\n", "utf-8");

for (let i = 0; i <= MAX; i++) {
	const buffer = fs.readFileSync(path.join(__dirname, "output", `${i}.json`));
	const json = JSON.parse(buffer);

	console.log(`Writing results for index ${i}`);
	for (const entry of json.result) {
		stream.write(JSON.stringify(entry) + ",\n", "utf-8");
	}
}

stream.write("]\n", "utf-8");

stream.on("finish", () => {
	console.log("Finished writing.");
	stream.end();
});

For each JSON file output by the API consuming script, I parsed its contents, and then output each individual firewall event returned from the API on its own line of a new combined results JSON file. Each event is kept on its own line to make analyzing it easier, because I’m able to open a read stream into the file and get individual events just by reading lines one-by-one. This is a lot easier than trying to load the entire 3.7 gigabyte reuslting file into memory and parse it all in one go[3].

Now, to actually do something with the data. Each of the entries looks something like this:

{
	"ray_id": "5aee04ff8ac7f8ab",
	"kind": "firewall",
	"source": "bic",
	"action": "drop",
	"rule_id": "bic",
	"ip": "222.150.231.47",
	"ip_class": "noRecord",
	"country": "JP",
	"colo": "NRT",
	"host": "update.shadowfacts.net",
	"method": "GET",
	"proto": "HTTP/1.1",
	"scheme": "https",
	"ua": "Java/1.8.0_51",
	"uri": "/shadowmc",
	"matches": [
		{
			"rule_id": "bic",
			"source": "bic",
			"action":"drop"
		}
	],
	"occurred_at": "2020-07-07T02:08:46Z"
}

There’s a bunch of interesting information in there. To start with, I knew I wanted to count the unique: user agents (i.e., Java versions), paths (the mods being used), as well as the countries and IP addresses the requests originated from.

On to actually doing something with the data. To start off, there are a bunch of Map objects (which are a better key-value store than plain JS objects) which will store the aggregated statistics for all the entries. There’s also a helper function that either increments an existing value in a map or sets it to 1.

const fs = require("fs");
const path = require("path");
const readline = require("readline");

const userAgents = new Map();
const paths = new Map();
const countries = new Map();
const ips = new Map();

function incrementStat(map, key) {
	if (map.has(key)) {
		map.set(key, map.get(key) + 1);
	} else {
		map.set(key, 1);
	}
}

To read the data, I just create a write stream and read through it line by line. If the line doesn’t start with an opening curly brace, it’s either the first or last lines and can therefore be skipped. Additionally, after parsing the line as JSON (minus the trailing comma), if the user agent string doesn’t start with “Java”, the item is skipped. There aren’t many, but there are a few spurious requests, likely from bots scraping every domain they find found, testing paths like /wp-admin and /wp-config.php.old, hoping for a vulnerable installation. For requests that do have a Java user agent, the incrementStat function above is called for each of the various tracked statistics.

(async () => {
	const stream = fs.createReadStream(path.join(__dirname, "results.json"));

	const rl = readline.createInterface({
		input: stream
	});

	for await (const line of rl) {
		if (!line.startsWith("{")) continue;
		const withoutComma = line.substring(0, line.length - 1);
		const item = JSON.parse(withoutComma);

		if (!item.ua.startsWith("Java")) continue;

		incrementStat(userAgents, item.ua);
		incrementStat(paths, item.uri);
		incrementStat(countries, item.country);
		incrementStat(ips, item.ip);
	}
})();

And with that, I can dump the individual stats to separate JSON files as well as calculate some more things based on the aggregated information. So, without further ado:

The Results

Before you look at the numbers, take everything here with a hefty helping of salt. While the data is mostly in line with what I’d expect, there were some which were vastly different than anything I would have imagined.

First off: breaking down the sesions by the IP address requests. There were requests made from 1.25 million unique IP addresses, and there were a total of 9.5 million requests made, making for an average of 7.7 mod launches per IP address. A little bit low, but not far from what I expect. The median number of mod launches per IP address is 2, which indicates that the vast majority of the IP addresses were responsible for very few sessions each, with fewer IP addresses accounting for far more game launches.

1,000,000 100,000 10,000 1,000 100 10

Number of unique IP addresses (y-axis) with a given mod launch count (x-axis, 0 through 500).

This is one of the most surprising results. Most individual IP addresses only made a request for a single one of my mods. This is not at all what I was expecting. Each of my mods depends on ShadowMC, a library mod I wrote. By themselves, the other mods can’t even function—the game won’t launch without ShadowMC. But just ShadowMC by itself doesn’t actually affect the gameplay in any way.

One of the most surprising results, was that there were a fair few individual IP addresses which generated an astronmical number of requests. There were 11 IP addresses that were responsible for over 10,000 mod launches in the past three months. The greatest of these was 27,267 mod launches. Even assuming all four mods were used, that’s 6,815 game launches. Over a period of about 100 days, that’s 68 game launches per day. Initially, my only guess was that it was game launches coming from a huge number of people behind CGNAT. But, when I looked up the ASNs for the worst offenders, thing became slightly clearer.

  • OVH: 95,161 requests total
  • HETZNER-AS: 27,267 requests
  • COMCAST-7922: 25,339 requests
  • ZONENETWORKS-AU ZONENETWORKS.COM.AU - Hosting Provider AUSTRALIA, AU: 23,306 requests
  • TWC-10796-MIDWEST: 13,967 requests
  • WOW-INTERNET: 10,986 requests

Hetzner, OVH, and Zone Networks are all server hosting providers, so a huge chunk of the requests presumably came from Minecraft servers running in their data centers (though I’m surprised Forge runs version checks on dedicated servers, given that there’s no user interface for presenting the results to the player). The remaining 4 ASNs all belong to ISPs, so my best guess for their unusually high level of traffic is that they’re using CGNAT.

Broken down by mods, the traffic is unsurprising. ShadowMC, being a library mod that all of my others depend on, was the most launched at 9.5 million hits (far and away the vast majority of the requests). From there, Ye Olde Tanks had 25.8 thousand launches and Underwater Utilities had 15.7 thousand, which is roughly in line with their relative popularity. Finally, Crafting Slabs came in with a whopping 23 launches over the past three months. This wasn’t all that surprising, as Crafting Slabs was never very popular and it was only updated through Minecraft 1.11, whereas the rest were updated to 1.12.

I had a number of other mods with over a million downloads that aren’t represented here because they never used the update JSON mechanism. This likely accounts for the vast discrepancy between the request count for ShadowMC and the total request count for the other mods.

Next up: Java versions. Every single request was made with Java 8, which is unsurprising because Minecraft 1.12 (the last version for which I updated my mods, and the only version for which I ever enabled the update server) requires at least Java 8, and Forge for Minecraft 1.12 did not support Java versions newer than 8 (due to Project Jigsaw).

5.4M
57%
1.8.0_51
808k
8%
1.8.0_241
732k
8%
1.8.0_251
315k
3%
1.8.0_242
297k
3%
1.8.0_211
266k
3%
1.8.0_252
226k
2%
1.8.0_45
202k
2%
1.8.0_231
184k
2%
1.8.0_212
171k
2%
1.8.0_221
143k
2%
1.8.0_201
113k
1%
1.8.0_191
94k
1%
1.8.0_181
86k
1%
1.8.0_222
80k
1%
1.8.0_171

The number of requests made with each Java version, along with the percentage of the total requests that version accounted for.

Far and away the most popular version was Java 8 update 51. I’m not certain, but I believe this may have been the version of Java that shipped with the Minecraft launcher. This chart is limited to only versions that account for 1% or more of the total requests, so it’s not visible, but 6,056 of the requests (0.063%) were made with versions of Java that identiy themselves as being OpenJDK, instead of the regular Oracle JDK/JRE. Additionally, a whole 37 requests (0.00039%) were made with versions of Java that included RedHat in the version string.

Next, broken down by country. This isn’t perfectly accurate, since IP addresses aren’t terribly reliable for determining location. But at only country granularity, it’s acceptable.

World Map Sudan, 181 requests South Sudan Georgia, 2,644 requests Abkhazia South Ossetia Peru, 18,540 requests Burkina Faso, 8 requests Libya, 53 requests Belarus, 24,518 requests Pakistan, 4,892 requests Azad Jammu and Kashmir Indonesia, 13,132 requests Yemen, 54 requests Madagascar, 105 requests Bolivia, 1,913 requests Serbia, 6,740 requests Kosovo, 84 requests Ivory Coast, 103 requests Algeri, 2,685 requests Switzerland, 41,696 requests Cameroon, 155 requests North Macedonia, 1,695 requests Botswana, 13 requests Kenya, 439 requests Jordan, 1,694 requests Mexico, 52,628 requests United Arab Emirates, 7,368 requests Belize, 209 requests Brazil, 180,644 requests Sierra Leone Mali, 22 requests Democratic Republic of the Congo, 6 requests Italy, 108,117 requests Somalia Somaliland Afghanistan, 101 requests Bangladesh, 1,859 requests Dominican Republic, 2,820 requests Guinea-Bissau, 21 requests Ghan, 67 requests Austria, 58,225 requests Sweden, 119,957 requests Turkey, 45,194 requests Uganda, 46 requests Mozambique, 31 requests New Zealand, 48,396 requests Cuba, 168 requests Venezuela, 3,622 requests Portugal, 49,457 requests Colombia, 15,942 requests Mauritania, 36 requests Angola, 100 requests Germany, 903,124 requests Thailand, 18,670 requests Papua New Guinea, 2 requests Iraq, 2,619 requests Croatia, 9,263 requests Greenland, 41 requests Niger, 8 requests Denmark, 92,557 requests Latvia, 10,696 requests Romania, 41,188 requests Zambia, 13 requests Myanmar, 400 requests Ethiopia, 68 requests Guatemala, 2,260 requests Suriname, 185 requests Czech Republic, 80,188 requests Chad Albania, 676 requests Finland, 38,853 requests Syrian Arab Republic, 370 requests Kyrgyzstan, 1,808 requests Solomon Islands Oman, 865 requests Panama, 1,784 requests Argentina, 47,207 requests United Kingdom, 633,863 requests Costa Rica, 5,142 requests Paraguay, 1,448 requests Guinea Ireland, 24,128 requests Nigeria, 332 requests Tunisia, 1,614 requests Poland, 178,749 requests Namibia, 414 requests South Africa, 30,565 requests Egypt, 6,467 requests Tanzania, 36 requests Saudi Arabia, 24,198 requests Vietnam, 15,957 requests Russian Federation, 467,667 requests Crimea Haiti, 11 requests Bosnia and Herzegovina, 2,463 requests India, 22,378 requests Canada, 569,968 requests El Salvador, 1,725 requests Guyana, 141 requests Belgium, 69,801 requests Equatorial Guinea, 7 requests Lesotho Bulgaria, 8,218 requests Burundi, 1 request Djibouti, 40 requests Azerbaijan, 1,316 requests Artsakh, Republic of Iran, 1,637 requests Malaysia, 20,303 requests Philippines, 23,844 requests Uruguay, 7,285 requests Congo, Republic of the Estonia, 12,043 requests Rwanda Armenia, 987 requests Senegal, 124 requests Togo, 4 requests Spain, 120,100 requests Gabon, 79 requests Hungary, 62,962 requests Malawi Tajikistan, 65 requests Cambodia, 826 requests South Korea, 31,277 requests Honduras, 1,037 requests Iceland, 4,469 requests Nicaragua, 384 requests Chile, 32,052 requests Morocco, 2,886 requests Western Sahara Sahrawi Arab Democratic Republic Liberia Central African Republic Slovakia, 22,549 requests Lithuania, 21,472 requests Zimbabwe, 48 requests Sri Lanka, 299 requests Israel, 24,202 requests State of Palestine, 932 requests Gaza Strip West Bank Laos, 101 requests North Korea Greece, 11,156 requests Turkmenistan, 4 requests Ecuador, 5,877 requests Benin, 1 request Slovenia, 10,455 requests Norway, 69,516 requests Svalbard Moldova, 5,860 requests Transnistria Ukraine, 111,071 requests Donetsk People's Republic Luhansk People's Republic Nepal, 967 requests Eritrea United States of America, 3,308,154 requests