Victimology Analysis and Data Leaks Site

Updated: Nov 30, 2022

It's the most wonderful time of the year - cyber predictions are coming! In the next few weeks, multiple people and companies will share their predictions for next year. These predictions would contain information from data leak sites of threat actors, also known as shame sites. In my opinion, relying on the data the actors' leak can be dangerous and make us misuse our resources.

This blog will discuss different types of data leakage by threat actors. We will do it by reviewing three models of actors that use data leakage as part of their operations. With these prototypes, we will derive two main takeaways for careful analysis of these actors:

How to better analyze a threat actor that uses data leakage techniques without relying only on the actor's claim in its shame websites?
How to better create a victimology profile using data leak sites without falling into biases?

To trust or not to trust, that is the question

Yesterday, I made some polls on my Twitter account to see what my followers think about analyzing data leak sites.

My first question was - "do you / your cyber threat intelligence team monitor name-and-shame websites (mainly on the dark web) to collect data about the group's victims?". 78.3% of the people responded that they use these websites. It makes sense. Many of us, cyber threat intelligence researchers and infosec people, monitor every declaration of these threat actors. I do it myself as part of my research on the public aspects of cyber attacks. I also use it to monitor the activity of my most prioritized threat actors. The problem starts when we use these data leak sites, or shame sites, as our primary source. We will elaborate on this subject later in this blog.

Let's return to the questions. In the second question, I asked, "When you monitor these sites, do you examine the data the ransomware group leaked or only count a mention of a company as a victim of the group?". The majority of the people, 59.1%, analyze the files. Almost half of the people think that the shaming sites are enough to collect data about the threat actor. On the one hand, it's tough to examine all the files. On the other hand, how can we trust the actors?

But if we leave behind the data examination, when we see a new publication by a threat actor, how can we validate the attack? I asked this question and got the following answer - most of the people, 77.3%, analyze the data leak to validate the attack. When the target approves the attack, as Cisco did with the Yanluowang attack only this summer, the analysis becomes more manageable. For example, when the targeted company is our vendor, we can search for a potential impact for us. Moreover, we can use the data for better adversary simulation, knowing what type of information the actors search for. Things get complicated when the targeted company doesn't confirm any attack against them. We will soon understand that we can't trust publications by the threat actors, but we have to assess the threat level of these actors somehow. Let's dig into some problems in using data from the shame sites.

The 'Scavenger' group case

I prefer not to mention a specific group but to speak about trends in this post. Let's start with one of the oldest trends. Since there is no good definition for hackers, anyone can claim to be a hacker. Just open a Twitter or a telegram account, give yourself a cool name, like Anon Hexadecimal Army, and people can monitor you as an anonymous group. When Russia attacked Ukraine last winter, many groups popped out, aligned with one side, and started to do something. Only part of these things is genuinely hacking. One fascinating group added the word "Ransomware" to their name. For many people, that was enough to add them to the portfolio of the ransomware groups.

In their reports, some vendors even reported cyber attacks by this self-proclaimed ransomware group in the last few months. How did they know about these attacks? Simply because the group published their claims in one unified channel, giving people from our industry an easy way to collect data about their 'attacks'. Claiming isn't enough, the group tried to give some proof. In their channel, the group attributed many cyber attacks against notorious companies to themselves by leaking alleged exfiltrated information of the victims. Very similar to Lapsus$, this actor gained publicity for taking credit. The problem starts when nobody saw these attacks. None of the victims ever confirmed a cyber attack by the group, nor a cyber attack in the time range. Further, no one reported about their TTP nor saw them in the wild. In conclusion, the only people who saw them in action were them.

Furthermore, if anyone checked the data they leaked and tried to validate it, you probably saw that, in most cases, the data was leaked already in the past. Scavenger group reframes data and take ownership of cyberattacks that have already happened. Then, why do the vendors keep reporting about their attacks? I can only imagine that they never tried to analyze the leaks.

From this case, we've learned that we can't rely on data leakage as a primary source; we need more parameters to determine that some attacks happened. When the only information we have about a threat actor is the actor's words, we can't trust them. They can be scavengers.

The 'Ransomware' group case

We always had groups using this technique to gain publicity. It's the bread and butter of hacktivists. But what happens when a very famous and verified ransomware group leaks data? Do we still need to validate the leaked data?

In my opinion, yes. It's a different game from hacktivists. When we analyze victimology, we must include the motivation aspects. Nation-state-sponsored threat actors and ransomware group attack their victims with different motivations. Accordingly, the victimology of these threat actors must be different. While the APT group attacks their targets based on geopolitical motivation, most ransomware actors' motivation is profit. The intent differs between a group required to get specific data from a targeted organization or sector and a group that attacks to gain money and doesn't care about the sector. In most cases, we know that ransomware groups' intention is the opportunity to get money. Therefore, the sector-based analysis might not give us the right threat landscape for our organization.

Let's return to the shame sites. Following the last paragraph, when the motivation is not geopolitical, there are mainly two other options for these groups - profit, which would be the most cases, and publicity. We will start with the profit motive. A ransomware operator encrypts files around the environment and wants to push the company to pay. This actor can use another leverage to ransom the organization. Since the actor has data, the group can use it to force the company to pay. For instance, these actors can publish the name of their victims on their shame sites, including hourglass, "Time is running out, you should pay us, or in addition to the disruptive attack, we will also publish your data". Classic double extortion methods. The organization failed to pay the ransom? The operators leaked the data. Is the organization lingering and hesitant to pay? The operators leak some of the obtained data as a letter of intent.

The second motivation is publicity, fame, and reputation. Like any for-profit business, Ransomware group needs to prove they know how to deliver their goods. A company won't pay for an actor that is not trustworthy and known for publishing stolen data after the payment. Gaining credibility is crucial, especially for a new ransomware group. Moreover, the group needs to establish credibility as a powerful group that attacked a company and is now willing to leak data. Young ransomware groups need these wins. Reputation is another reason; how to make people fear you if not by taking ownership? The victim company might avoid payment for a group that didn't leak any data in the past. It only makes sense that some ransomware groups that decide to use public leverage against their victim would leak data. Of course, other groups don't share all the data about their victims. Some groups can publish only chosen victims or not present victims at all due to many reasons. While some ransomware groups are highly engaged with the infosec community, others decide to lurk in the shadow, avoid publication (and use it to stay out of the infosec radar). Some don't leak the data; they sell it to a third party. How can we use the shame sites as a reliable source when even the actors manipulate the data?

Some groups took their publicity too far. Vitali Kremez once told me - when you have too much money, your currency is different. You start to care about other things, and fame is one of these things. Imagine that you have billions of dollars. Another few thousand won't change your checking account. When you have so much money, you start to cherish different things like cars, private jets, and fancy closets; you have a new currency. Hackers don't have liabilities as private people; they hide behind their public handles. Being a famous criminal celebrity can be one of these new currencies. In the last few months, researchers noticed that the time to rebrand ran short, but some actors ride in the opposite direction. These actors make their group name 'notorious'. What can you do to have the community's full attention? Start taking responsibility for more breaches is one option. We all saw some ransomware operators publish irrational numbers in the last few months. Using the data you have from one company to ransom their third parties is another powerful technique - every company saves other companies' data.

Recently, we saw actors that make it easier to get to their shame site - more people can easily report about them if they don't need to travel to the dungeon of the darknet. When it's so easy for people to frame information as they have found it in the 'deep-web,' the hackers can use it as an opportunity. When the threat actors want to hide, they know how to do it. We saw threat actors opening a clean web .xyz site lately - it has a reason. They want our attention and us to find the websites and data. The Ransom note includes a link to the 'shame website' or the negotiation site.

If we, as infosec people, don't validate the data, the actors can claim they attacked more companies, even without attacking them. It's different than the scavenger case because the actors genuinely attack someone. Unlike the scavengers, these groups are known by their TTP, and we have confirmed attacks in the past. However, without validating the data, we might wrongly assess the volume of these attacks by the groups. We can't determine the actors' victimology map only based on the shame sites, even when we have the artifacts we don't have in the thrill seekers groups.

Curated Intelligence, a community I'm proud to be a member of, published the same message in the blog, named "The Difficulties and Dubiousness of Darkweb Data Leaks Sites". In the blog, written by my friend @BushidoToken, Will added some use-cases from ransomware group, including groups that lied about their data - published less victims than they attack. I'm highly recommand reading.

The 'Influence' group case

If you follow my blog, you probably saw my last post about me being a target of an influence-based operation by a nation state threat actor. If you haven’t read it, check it out in the following link.

I won't elaborate a lot on this use case since it's pretty evident in my opinion, but it's worth mentioning. A group that executes the "Hack and Release" method does it because they want to influence. Only recently, the FBI published a report about some of these groups; I'm quoting the relevant aspects of the motivation:

The FBI assesses the purpose of these operations is to undermine public confidence in the security of the victim’s network and data, as well as embarrass victim companies and targeted countries. These hack-and-leak campaigns involve a combination of hacking/theft of data and information operations that impact victims via financial losses and reputational damage.

We can understand that these groups are the masters of mischievousness - they publish data because they want others to speak about the data. If these groups work on behalf of a government, we can't conclude the real victimology map of this regime based on a specific influence-based actor. Since the espionage groups don't leak their data, we will get an external validity problem with our sample.

Victimology and Data leaks

All these cyber group prototypes made me wonder how we can rely on these 'shame sites' to make these cyber security predictions and what we can learn about victimology. From the cases I've mentioned above, one thing is obvious. We have to analyze the data leak websites with kid gloves.

But how do we do it?

First, We can't rely on the actors' claims without valid proof. The proof we need is a combination of verifying the leaked data, a confirmed TTP of the group, and confirmation of at least some targets. We can find this information both from valid telemetry of a security vendor or the company itself but never based on the hackers' statements.

I think the best way to ignore these noisy groups is to use threat actors modeling. I've been working on my threat actors model for a few years now and hope to publish it soon. The first step is gathering enough data about the actor to create a good profile. The profile should contain all the data mentioned above and can also include victimology aspects (see more in the following paragraphs). This way, we avoid wasting our scarce time monitoring groups that only gain publicity on our behalf. For the client-side intelligence people that read this blog - assessing threat actors like that can help you to choose the right vendor.

Second, we don't use the 'shame sites' as a valid sample. Anyone that learns statistics knows that correlation is not equal to causality. According to Bradford Hill's criteria, we need no other likely explanation for causality. I hope I have convinced you already that there are plenty of other explanations for publishing data that aren't limited to breaching a company. The hackers' claims alone don't indicate a valid breach - and we shouldn't treat it like that.

Moreover, we would never know about many of the companies the actors indeed attacked. Thus, we can't use data we didn't validate to assess threat actors. Nor making predictions about the actors since the sample contains all the data we need. We should monitor these websites; there are plenty of benefits to using the data in the profile creation process. But we should never use this source as the primary source.

The third aspect is victimology. We can't determine the victimology based on the shame sites; we need a better profile. Remember the two primary artifacts:

Actors that use data they found in one company to ransom another company - if we don't check the data and discover what organization the actor breached, we can make mistakes in the victimology assessment, including more sectors.
Actors that only leak part of their victims - if we establish our assessments based on the data leak sites, we would get the wrong conclusions.

As we need a better profile, we should structure the victimology analysis from more artifacts than the sector or the targeted country. Since we would never have the complete data, victimology mapping that would include only one artifact wouldn't assist us. Conducting Sector-based victimology analysis for the threat actors can be significant for some sectors, like the defense or oil and gas sectors. For other sectors, maybe we need different variables - remember that many ransomware groups attack their target because of the opportunity and don't limit themselves to a specific sector. Use your analysis on the motivation to determine if you need to use different variables or to give different weight to the sector. For example, the organization's size or revenue can help us better understand the threat actor victims' profile. Moreover, we can all these variables; targeted countries, targeted sectors, and business aspects of the targeted companies. We need to differ the weights in our victimology assessment.

Data leakage sites are essential to threat intelligence analysis like any other source. I hope I convinced you in this blog that we have to think differently about cyber threat actors and not use only the data the actors give us. The actors do it for a reason; let's make smart intelligence.