Pastebin is a popular website for storing and sharing text. Though it's mostly used for distributing legitimate data, it seems to be frequently used as a public repository of stolen information, such as network configuration details and authentication records. Various hacker groups and individuals also use Pastebin to distribute their loot the highest—a trend perhaps initially set in motion by LulzSec.
What's Popular on Pastebin
To get a taste for the kind of information available on Pastebin, take a look at its Trending Pastes page. It's not uncommon for the most popular pages to include gems such as:
- A listing of subnet addresses that belong to various corporations
- A dump of compromised Facebook accounts, complete with email addresses and passwords
- An internal user database of a compromised website, including email addresses, privileges and password hashes
- An export of a users table from a compromised database, including usernames and passwords
Why Do Hackers like Pastebin?
What is attracting the hacker community to Pastebin? And why do compromised records persist on the site? Trying to figure this out, I asked on Twitter, why Pastebin, rather than some other site, became a popular platform for sharing stolen records. The responses I received highlighted the following attributes of Pastebin:
- It's easy to use
- It can handle large text files
- It doesn't proactively moderate postings
- Publishing there doesn't require registration
- Its heritage is rooted in IRC networks
Also, I received a pointer to an article by Matt Brian titled Pastebin: How a popular code-sharing site became the ultimate hacker hangout. Among the many examples brought up in the article is the story of the data stolen from Sony Pictures being posted on Pastebin and receiving 155,000 views before it was removed due to a takedown notice from Sony. I'd like to better understand the role that Pastebin plays in such incidents.
Pastebin's Handling of Takedown Notices
To me, the most interesting aspect of Matt's article was the the perspective that Jeroen Vader, the owner of Pastebin shared on the use of the site to share stolen data. He said:
"Pastebin is a website that is used by millions of people every month, and some of those people will create pastes with sensitive information in it. We have a good abuse report system in place that is monitored through out the day."
Jeroen explained that the site responds to takedown notices and that "if a reported item contains private information it can be removed instantly."
Is that a reasonable stance? I can understand why the site doesn't want to take on the burden of moderating content. Yet, identifying and flagging the files that might contain sensitive data isn;t very hard. As a starting point, Pastebin could merely look at the items on the top of its Trending Pastes page.
Automatically Finding Stolen Data on Pastebin
Pastebin could also automatically look for the signatures that indicate possible sensitive data. In fact, that's what Jaime Blasco seems to have done to create the now-defunct tool called PastebinLeaks, which automatically identifies stolen data artifacts posted on Pastebin. The service was quite accurate and its findings, published on Twitter, were disturbing:
A similar free service, still active on Twitter, is Dump Monitor by Jordan Wright. Another example of a free service that monitors Pastebin for stolen data is Have I Been Pwned? by Troy Hunt. Another is LeakedIn.
Keeping an eye on sites such as Pastebin has also become a common tactic for threat intelligence-gathering companies and projects.
Wrapping it Up
To sum up, attackers seem to use Pastebin to share stolen data because the site is easy to use for sharing voluminous text and because their buddies use it as well. Moreover, they know that the data published there will be around for some time for the world to see, since Pastebin doesn't proactively moderate content.
It's interesting to explore the technological, historic and sociological reasons why Pastebin has become a popular repository of stolen data. Perhaps more importantly, we need to understand how companies can identify when their data was published on a site such as Pastebin. Also, my hope is that such sites will implement some form of proactive monitoring and will deal with suspected data leaks without waiting for a formal takedown notice.
For my follow-up post related to this topic, see Using Pastebin Sites for Pen Testing Reconnaissance.