Friday, September 12, 2008

RAID Part 3, Iteration, Block Sizes, and the First Steps of Recovery

In the previous installments, I have talked about the general concept of RAID and how it is not really standardized. We broke RAID down into 4 main groups, left symmetrical, left asymmetrical, right symmetrical, and right asymmetrical. Of course if that was all to it, it wouldn't be such a hassle to reconstruct failed RAID arrays. To that end, I spoke about parity levels and that the number of disks don't necessarily have to be equal. This can complicate matters a bit in the reconstruction process. Of course, that is not the end to the configuration variables that must be discovered to reconstruct an array.

Iteration (Delay)
In the previous article, we examined a number of array configurations for RAID 5. The easiest to understand in probably the left-asymmetrical RAID 5. Again, in the previous article, the example used was a left asymmetrical RAID 5, Parity 5, on 5 disks. In this example, the parity rotated on every stripe, thus it rotation iteration is 1 stripe, then rotate.

left asymmetrical RAID 5, Parity 5, on 5 Disks, Iteration 1

However, some manufactures (cough, HP/Compaq, cough) decided this was too simple for their liking, so added a new variable to the game, iteration (delay). In the this configuration, some number of stripes greater than 1 is written/read before the parity rotation occurs. Thus array is somewhat akin to an incestuous relationship between RAID 4 and RAID 5, where the parity remains on the same position (note, that this is not necessarily the same disk as discussed in the previous article) for a number of stripes before rotation. When the previously defined number of iterations has occurred, the parity rotates as normal. The number of iterations of a stripe before rotation is up to you to find out (we will talk about techniques in later articles), but 16 is usually a good place to start.

left asymmetrical RAID 5, Parity 5, on 5 Disks, Iteration 2

Block Size Primer
Up until now, we have ignored block sizes. I am not talking about blocks at the disk level (sectors), but the chunk of data that is written to a disk before moving to the next disk. A "true" RAID five should always use some integer multiple of the sector size of its component disks (usually a 512 byte sector), but I have been surprised before so consider yourself forewarned. If your into guessing, start at 2^3 (8) sectors, moving up by a factor of 2 each time until you get to 512. If you get here, you probably missed something. Of course, if your not into guessing so much, know that the block size is usually some power of 2*sectors_size and read on.

Determining the Parity Level
As I mentioned, though somewhat briefly, in the previous installments of this RAID series, the parity in RAID 5 is a simple XOR of the data blocks within the stripe. The parity is then written to the parity block for the stripe. If one of the disks in the RAID fails, the missing blocks can be recalculated by performing an XOR on the remaining blocks to recover the data. We can use this fact to our advantage in the recovery process. Provided that we are working with a controller failure or other situation that all of the component disks of the RAID are present, we can easily determine the parity level of the RAID. Since the parity block of any particular stripe is equal to the XOR of the remaining data blocks in the stripe AND any value XOR itself is zero, the XOR of an entire stripe, including the parity block will always equal zero. Therefore, if you write a little program that performs XOR on 3, then 4, then 5, etc blocks at a time, the one that consistently results in zero is your parity level! That's it, well not really. There are some other variables we need to talk about for this to work in practice. This first being configuration data at the beginning of the disk that would not play well with such a simple method.
That being said, in the next installment we will talk about areas of the component disks where metadata of the RAID itself may be stored, locating the beginning of the data area, locating other common structures that will help you determine some parameters, and finding the parity blocks programmaticly.


Powered by Qumana

Thursday, September 11, 2008

Mindset - Professional Hackers

*** NOTE: URL's cited (but not linked to) in this article may contain malware that could harm your computer or encourage you to give money away for nothing in return. Please be cautious when browsing to any address not explicitly linked to!

I would like to take a few minutes to review an in-depth article by The Register (Anatomy of a Hack). Specifically I want to focus on the areas that were well executed and areas that create vulnerabilities to the hacker's objectives.

As always, we'll be applying the General Theory of Laziness as a filter through which we look at the hacker's actions.

Fake Google Site:
* The use of is immediately suspicious. It would have been far less suspicious to use a generic domain name. Here the objective of the hacker is to appear legitimate. In-so-far as this attempt is made it backfires. Picking this domain name as your "cover" is LAZY. There are many better options out there, ie: seriously legitimate sounding domains.

GUI Specifics: XP GUI elements observed from a Windows Vista System
* In Figure 1 of the article we see what appears to be a pop-up window over the browser reporting to be a WARNING!!! from "Quick System Scan Results". On any computer other than Windows XP we would immediately identify this window as part of the hack and not an actual system message. Only rendering only one window style is LAZY, but it probably achieved the objective - target a large user base (All Windows XP users).

Exciting Words
* "WARNING!!!", "Spyware.IEMonster.b", "CRITICAL", and "strongly recommended" are all sales tactics. You see the equivalent in real estate that you should have no interest in! Specifically, using a malware name like "Spyware.IEMonstser.b" is LAZY, it plays on the fear of IE users with an obviously fake threat name - when is the last time you saw a threat name and it made actual sense to you?

Spelling, Grammar
* Figure 5 and Figure 7 (others?) contain blatant punctuation and/or grammatical errors. While some professional, REAL software does contain such errors, generally you see these errors in deeper parts of the application. The "client's" first experience with software is set by the installer and other components (such as the company web site). Errors such as these at this point should be a big warning. The hackers were LAZY and did not have someone accurately check grammar, spelling and/or punctuation.

* Eventually this article discusses how the end-point of the hack, getting money out of the whole deal, all points in the same direction. One of the hardest problems of hiding, even online, is money. For money to have value it has to end up in someone or some organization's hands - eventually the investigator identifies a variety of malware that all pay the same person/organization.

My colleague, Dave Gilbert, warns me that "follow the money" has become a cliche phrase, but I will hazard it's use here.

I contend, that in a world that is increasingly digitized, we will see the oldest of investigative techniques to become more and more important. In this case, following the money trail, the way that these hackers get money out of the "hack," leads us back to the actual attackers.

Any thoughts?

Digital Evidence Formats

Folks, this is a little off our normal topic of "Intrusions and Malware Analysis," but I think it's entirely relevant.

The Plea
Over the last 6 years that I've been involved with computer forensics/computer security I regularly reflect on the lack of open standards for handling digital evidence. Can we please, as a community, make a grassroots decision to standardize our imaging on a common format?

The Problem
I, personally, have experienced problems with proprietary image formats that are password protected or have incompatible format versions for the analysis software that I am using. When these problems arise it often costs a ridiculous amount of time and potentially money to "fix" the problem. The problem is proprietary evidence container formats. There, I've said it.

Currently, if I had to choose, I would say that "dd", or raw bitstream, would be the best format to standardize on. Split, complete, whatever. Raw format allows for the most flexibility in performing forensics examinations. Every forensics suite (that I know of) supports the format, there are never issues or concerns regarding format or software version, we're able to split the image to support different file systems if need be... In my opinion, it's the best option available to us.

Now, you may be sitting there thinking, "But raw doesn't support compression" or "But raw doesn't allow me to password protect the evidence." You're right on both counts. I don't think that Raw format is the end-all fix to this problem, but let's start with the raw format and see how we can meet various requirements of evidence containers while staying in the realm of open standards.

If there is one thing that Open Source/Open Standards software has given us it's compression algorithms. One might argue that there are too many compression formats on open platforms and I would agree. We need to identify a compression algorithm that would bring reasonable compression and ease of split file output for our increasingly large image files.

The selected compression standard must also allow for password protection. Some evidence may not be able to be processed with a chain of custody at every point and should be protected from illicit viewing and potential tampering.

Other Details
I would like to think of the file format as more of a wrapper than the actual content itself. In addition to handling some level of compression and protection, I'd like to be able to insert a 'hand receipt' into the package. Generate a hashlog of the drive and insert into the overall package. Think about it more like a portable filesystem than a file itself:

- PackageContents.xml (Contains MD5 of image, information about origin, etc.)
- Hashlog.txt
- ImageFile.raw.dd
This is very similar to how the Open Office Document format works, it makes sense and could make working across company/organization boundaries much more efficient.

Did I miss any major features that you would want to see? Let me know in the comments!

Wednesday, September 10, 2008

The Merry-Go-Round

Thanks guys for inviting me to the party. In return, I will try to share some thoughts until you all get tired of my perspective and style and kick me off this thing. Definitely some interesting topics so far and I plan to study at least one of them well enough to comment intelligently. In the spirit of giving back, I'll go ahead and throw another topic out.

While I find it humorous that I'm doing so, I've spent the better part of the last year trying to help folks new to the Intrusions game get comfortable with cold-box intrusion analysis. I say humorous, because it wasn't all that long ago, in Dave years, that I was scrambling to teach myself how to tackle exams as well as bugging a few good friends and respected colleagues to validate what I thought I knew and straighten me out when I was wrong. When I was learning intrusion analysis, I had the advantage of some pretty extensive criminal examinations and field investigations experience. I've found that understanding what goes into building an investigation with an eye toward prosecution can help in figuring out what a suitable intrusion analysis examination should produce. What I've found most challenging is figuring out ways to relay what I refer to as a "way of thought" to folks who may not have the same perspective I have with regard to an analysis process. This is probably most akin to your Dad or Uncle telling you you're not holding your mouth right when you're baiting a hook, turning a wrench, or tying a knot. So, as I've tried, scratched, thrown away, and forged ideas anew this last year, I've decided that intrusions analysis is most like a merry-go-round.

Great minds and smarter men than me have taught me that the intrusions response process as a whole, of which cold-box analysis is a piece, is a cyclical process. It's almost natural to me that intrusions system analysis is cyclical as well. We basically start with some kind of known, or guess, and go to the unknown. As the unknown, becomes a known, it leads to other unknowns, and so on. I have seen and rode this ride many times. As most of us have, I've chased rabbits down hole after hole, trying to find the nugget (vector, attribution). As I've tried to explain how this can happen, while also working in an environment that leans toward standardization, automation, and desired speed and more speed, I've remembered my short-lived youth on the Jersey Shore where for a time I and my good buds would jump on and off the merry-go-round in a fine arcade establishment. The trick, at least back then, was to time the jumping on and off to coincide with being out of sight of the ride attendant. So, I've concluded that a key aspect of intrusions analysis is knowing where to jump on and when to jump off the beast lest your hard work become irrelevant due to passage of time.

Hacker Mindset - General Theory of Laziness

When approaching computer forensics I try to put myself in the shoes of the subject of the case. In intrusions I try to think like the attacker... This is the inaugural post of a tag series I'll call mindset. During this series I'll try to uncover the thought process behind different types of hackers and show how understanding their individual nuances can increase your proficiency in examining intrusions.

In this post I reveal the base principle I use in examining an intrusion:

General Theory of Laziness

The theory works like this: Given two solutions that meet the attacker's requirements, the attacker will use the solution that is "easiest" to implement.

Therefore, considering the attacker's objectives are met by the solution...
  1. The easiest attack vector will be used
  2. Files of Interest will be in a relatively tight-knit location and time frame
  3. Multiple types of system events will be occurring in a tight-knit time frame, for example: log file entries at the same time as binaries being modified or created on the system
  4. Malicious behavior will be obfuscated no more than required
  5. Malicious binaries will reside in a system PATH location or other easily accessible location
  6. Malicious binaries will not contain convincing extended Copyright, Program, Version or other 'normal' executable information (Right-click -> Properties)
  7. Spell checking will be obviously overlooked
  8. Full error checking is really hard, expect use of Windows System libraries to make life easier

Case Studies:
Let's look at some different examples and identify areas that highlight this theory and places we would expect to see exceptions.
Professional Intellectual Property Thief
- Objectives: Gather one or more particular types of information from a given Target taking care to not identify the "customer" this data will be given to or that the proprietary information is even being taken.
- Prolonged access may be necessary, attacker may make more of a conscious effort to hide the backdoor.
- Secretive ex-filtration of data is critical, malicious communication may be highly obfuscated or even encrypted.

Script Kiddy- Objectives: Gain community credibility, gain competence in intruding.
- Expect to see the noisiest intrusion ever. Lots of security log errors, possible service crashes etc.

Bot-Net Intruder/Manager
- Objectives: Take control of as many systems as possible.
- Security is not a priority, mass infiltration is. Expect binaries that attempt to self-propagate, call to their controller often and are not hidden or obfuscated well.

Expect to see more on these topics in the future.

Do you agree or disagree? Let me know in the comments!

Wednesday, September 3, 2008

RAID Part 2, RAID 5

I am going to talk a bit today about the various configurations of RAID 5. As I have mentioned before, RAID is not a standard, but a concept of how to use multiple disks as a single device. First defined by the paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)” by Gibson, Katz, and Patterson in 1988 by the CS powerhouse of the time…you guessed it, Berkley. However, the paper described the concept of RAID, it did not define a standard nor was the concept ever boiled down to one. As a result, RAID 5, or most any other RAID level, is implemented differently by each manufacturer. This makes it difficult indeed to recover an array when things go bad.
The concept of RAID 5 is striping with rotating parity. The idea is that data is written in blocks to an ordered disk set and one of the disks for every stripe contains the parity (simple XOR) of the data in the stripe. There are a bunch of sites out there that explain this, but the basic idea is that the data from each data block in the stripe is XOR’ed with the others, the result of which is written to the parity block. Thus, if a disk fails, the data from the remaining blocks can be XOR’ed to reconstruct the missing disk. The parity level is the number of data blocks plus the parity block, e.g. parity 3 (p3) is two data blocks plus parity. Since the parity block and its member blocks must each be located on a unique disk, the maximum number of blocks per pass is the number of disks minus one (n-1). While it is of the norm to see the parity level equal the number of disks in a set, it is not a requirement. Therefore, the parity level can equal any number between 3 and the number of disks in the array, which can cause al sorts of headaches for those trying to recover data off a dead array.

RAID 5 can be broken down into four different methods. There are certainly more variations on these methods which I will talk about in another article, but this is the best place to start. The methods are left asymmetrical, left symmetrical, right asymmetrical, and right symmetrical. For the purpose of the explanation, I am going to ignore block sizes and iteration (delayed rotation), which must be known or discovered to recover data.

Left asymmetrical is most employed by hardware RAID cards. It writes data to blocks n, n+1, n+2, … n-1 then parities those data blocks to the last block so that the resulting sequence is n, n+1, n+2…n-1, p. Parity is then rotated left (backwards) one block so that the sequence is n, n+1, n+2…p, n-1.

left-asymmetrical RAID 5, Parity 5, on 5 disks

This is probably the simplest form of RAID 5 to understand; however, from a performance standpoint, the system can read only from Parity-1 blocks before running to the possibility of reading a disk twice. This is where symmetrical RAID 5 comes in. Here this blocks are written as before, n, n+1, n+2...n-1,p; however, instead of the next block being written to next sequential device, it is written after the parity block (which is below the previous parity block) and the stripe is written in-line from that point on, until it wraps to the parity block. This method allows for all blocks in a stripe to be read.

left-symmetrical RAID 5, parity 5, on 5 disks

There are the right hand versions of the above as well. Right asymmetrical starts with the parity block at the first placement, and rotates right. The first data block in the stripe is always in the first placement (except when the parity block is located there, then it is incremented).

right-asymmetrical RAID 5, parity 5, on 5 disks
Right symmetrical writes the data blocks following the parity block, as with its left rotating brother, it writes data following the parity block and wraps around until it reaches the parity block again.

Right symmetrical RAID 5, parity 5, on 5 disks

Adding to the confusion
In the above examples, we looked at RAID 5, parity 5, on 5 disks. The parity level could be any integer greater than or equal to 3. To make things more difficult, the number of disks is not limited to parity level. They can number greater than or equal to the parity level. Lets take a look at a left-asymmetrical RAID 5, parity 4, on 5 disks.

Left asymmetrical RAID 5, parity 4, on 5 disks
Here the data is striped with three data blocks and a parity block, which is parity 4. However, in the second stripe, position 0 is on the first block or the 5th disk, so the beginning of the stripe starts there. Notice that if we did not know the configuration of the array, we would end up with 2 parity blocks in some rows.
In the next installment, I will talk about rotation iteration (delay), block sizes, and ways to determine parity levels. Stay tuned.
Powered by Qumana

Monday, September 1, 2008

RAID Tool Update

I have released a FUSE file system that can put together an asymmetrical RAID 5 with any parity level and any iteration level. I don't have any controllers to test the thing on other than the Compaq one it was developed with. Remember this tool is a proof of concept, use it accordingly. I am working on the update to this tool that will use one single fuse mount command and plugins to add "files" at will. Reassembled RAID images will be the first of these files and it will allow for compound RAID with only one fuse instance running.

If your interested in the tool, check it out at or download at


Powered by Qumana