EggXpert

The official Newegg tech support community and Newegg tech support forums. Learn about PC building, case mods, computer repairs, and computer troubleshooting. Get help from knowledgable community members about computer hardware and computer software, laptops, notebooks, netbooks, consumer electronics & mp3 players, home networking, lcd TVs, home audio and more.
Welcome to eggXpert.com. Sign in | Join | Help
in Search
Advanced Search

Classroom 101: Hard Drive Optimization

Last post by . replies.
Sort Posts: Previous Next
  •  09-17-2007, 10:45 AM 162906

    Classroom 101: Hard Drive Optimization

    Last Updated: 11/1/2008 : 3:33PM CST

    I've had a lot of people asking about performance optimizing so here is a post that will hopefully answer all your questions!

    This post will cover the following topics:

    • Hard Drives and how they work (Platters, Density, Head)
    • Partitioning
    • RAID0
    • Stripe Depth
    • Independent drives
    • RAID1
    • Fragmentation
    • NCQ/Cache
    • Root's Thoughts


    Recommended reading before reading this post:Classroom 101: RAID Explained

     

     

    For anyone to understand how to improve their hard drives' performance, one has to know how they work.

     

    Hard drives and how they work 

    Physically, a drive is made up of 1or more platters. These platters are flat discs that have been given a thin layer of magnetized coating.

    http://www.core.org.cn/NR/rdonlyres/8EC59DE0-D5BC-4A99-828E-0E9FAED236B0/0/chp_harddrive_inside.jpg

    The more GB's of data you can put on one platter, the 'denser' the disc. This is where you might hear the term Perpendicular Recording. What perpendicular means to you is that you can put more data on 1 square inch than the standard Longitudal Recording. Below is an image pulled from wiki that gives a great contrast to the two technologies.

    http://upload.wikimedia.org/wikipedia/commons/a/aa/Perpendicular-eng.png 

     To put it in perspective:

    Current hard disk technology with longitudinal recording has an estimated limit of 100 to 200 gigabit per square inch due to the superparamagnetic effect, though this estimate is constantly changing. Perpendicular recording is predicted to allow information densities of up to around 1 Tbit/sq. inch (1000 Gbit/sq. inch) (source:wiki)

     These platters is what holds your data. A mechanical head, moved by an arm, hovers over the platter and flips each bit's polarity to either 1 or 0 (on or off). Mean while, the drive is spinning at 7,200 rotations per minute (or 10,000 or even 15,000 rpm). When the head has to move to another 'track', it can take milliseconds for it to do so, all depending on where it is and if it had already spun past the target bit. In computer time, milliseconds is a long time. This is where optimization comes into play. Things like perpendicular recording, where the head can travel less to get to that one bit you need to flip (due to higher density), can help.

    Further readings:

    http://www.storagereview.com/guide/over_op.html 

    http://en.wikipedia.org/wiki/Hard_drive 

    http://en.wikipedia.org/wiki/Perpendicular_recording 

     

    Partitioning

    Partitioning is the act of logically slicing up your drive by physical locations. Take these pictures:

    Logically you see Tracks and Sectors. Like a race track, 'Tracks' are circuits around the drive. Sectors are the 'quadrants' in the drive--although quadrants isn't the right word (more than 4 sectors) but you get the idea. The top image was how it use to be, before Zone Bit Recording (ZBR), which you can see the difference on the bottom, which uses ZBR. ZBR comes into play in optimization because it allowed the outer rings to have more sectors than previously.

    Generally when you partition, you normally partition from the outer (Track 0) first. Because of ZBR, the outer rings are faster than the inner rings, this is because the mechanical head starts from the outside and works its way inward. When you first start writing to the disk, it follows this principle, writing from outer ring to inner ring. This is the first tip I give to you. The first partition will be technically the fastest part of the drive. Since it has the most storage, per track, there is less likelihood of seeking (a term I'll get into later). So, the first partition is where you want all of your fast stuff in. The later partitions you'll want your slower stuff, like backups and personal data (docs/spreadsheets/etc). Partitioning isn't bad, just annoying at times (Resizing partitions can be a hassle so make sure it's big enough for your needs but not overly big).

    And partitioning just doesn't give that mechanical head smaller distances to your data, it can also localize fragmentation (a term I will go into later). When you only have 10 or 20 GB's for your files to reside on, they are less likely to get fragmented because there aren't that many places to go *grin*. Faster defragging, and easier backups--it can be a win-win situation.

    Recommended readings:

    http://partition.radified.com/partitioning_2.htm

    http://en.wikipedia.org/wiki/Disk_partitioning

    http://storagereview.com/guide/tracksZBR.html
     

     

    RAID0 

    Heh. This is a hot topic that will generate a lot of flame. I always say, in the storage desktop world, that there are two kinds of people: RAID0'ers and Non-RAID0'ers. For the record, I'm against RAID0. Test have shown that it is not worth investing in. This is because you typically have a bottle neck somewhere else other than the drives. Rememeber: YOU ARE ONLY AS FAST AS YOUR SLOWEST PART If your CPU is doing most of the work and the drives are waiting on your CPU... Well lets just say you could have the fastest drives in the world and it wouldn't make a difference.

    RAID0 was really meant for servers, where the type of IOPS (Input Output Per Second) they are getting needs the fast drives because their CPU/Memory/etc is waiting on the drives (remember the milliseconds?). The type of programs they use (like databases) are sending tons of information to the drives, so fast that the drives can't keep up. In gaming, the processors (CPU/GPU) can't keep up because they are doing most of the work, not the drives. Games don't write to the disk, nor read from the disk, as often as a database does. Sure the drives can give you 3 Gb's per second, but when you are reading a 3k file... you get the idea.

    "So root, when would it benefit me, as a desktop user?"

    This is where I put the 'back peddle clause'. I will say it'll benefit you when you are doing large (sequential) data transfers and A/V editing. Things that are disk intensive, and not CPU/Memory intensive, will definitely benefit from RAID0. You have to ask yourself, what will be working the most in this program? Are calculations in bullet trajectories and the physics behind an exploding cow going to be the most intensive thing? Or is copying and pasting that DVD that you recently *cough* purchased going to be what you do most? Depending on your answer, you may or may not use RAID0.

    Remember, RAID0 means that both drives are being utilized for you, simultaneously (theoretically--some controllers do round-robin. Don't get me started on those). If one fails, both fail, due to striping--aka you loose all your data. Some unfortunate people have had one of their drives fail in a month, loose all their data. This is a personal risk. Since they are both linked, there is a higher probability of them failing. If you know statistics, you should know why. The average life of a hard drive is typically 5 years (why do you think Seagate gives you 5 year warranty?). Put two of them together, that average life--the probability of it--decreases. In a gaming desktop environment, unless you do regular backups, it's not worth the extra IO you could use. It's all about how you will be using your drives. If it's just large media files, that you will write to the drive once and "view" them, it won't be worth it. Streaming video, in todays modern hardware, is nothing (no bottle necks typically). There are no gains.

    Are you seeing the big picture now? *grin*

    I hope so.

    Recommended readings:

     http://www.storagereview.com/php/cms/cms.php?loc=news_content&id=970&start=26&range=10

     

    Stripe Depth 

    You might hear two terms when dealing with RAID, Stripe Depth and Stripe Width. What ever anyone tells you, there is a difference, and many people might get them confused. Stripe Width is, essentially, the amount of drives involved. If it's RAID0, and you have two drives, the stripe width is 2. 

    Stripe depth, is a whole other story. Stripe depth is the amount of (usually) kilobytes your stripe is deep. So any type of RAID that uses striping will use this (RAID5, 6, 0, etc). There is no sweet spot for stripe depth (and don't listen to anyone that tells you there is) because it all depends on what your needs are. Does this sound familiar? Like anything else in the computer world, it usually comes down to what your needs are. As a rule of thumb, the transactional type of programs that have a lot of small reads/writes, you want a larger stripe depth. The larger, data transfer, type of actions will need smaller stripe widths. This is because you have two (or more) drives that you utilize at once. You have to find that right 'balance' between reading/writing between the two (or more) drives.

    Your range of stripe depth will be 2kiB to 512kiB (or higher) in powers of 2 (16, 32, etc). For desktop usage, 64kiB is a good number to start out with. Depending on what you are doing, you can play with the depth to see what your sweet spot it.

    Recommended readings:

    http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html 

     

    Independent Drives

    Independent drives (you know, what you did before you ever heard of the term RAID?) can be more beneficial than RAID0. 

    *Gasp*

    "No he didn't."

    Yes I did.

    Why, you ask? Think about what RAID0 does: it combines multiple drives into one big drive. Now, if you have a few programs/files/etc that need to be ran, simultaneously, you will have to read/write to the drive in one big pool (in RAID0). But what if you had a separate drive for each thing? Gosh darnit, you'd have drives dedicated to that application/game, wouldn't you?

    Hear me out.

    Let say you had your OS/pagefile on one drive, and your games/applications on another. Wouldn't you have a drive dedicated to each function?

    Just something for you guys to chew on (for the next few sections at least) 

     

    RAID1 

    I'd have to say RAID1 is one of the most ignored RAIDs, due to it's heavy penalty (entire drive will get used up, 2 drives--or more--for the storage of one). I only mention it here in this post because it has both a read performance gain (it's the same read performance as RAID0), you don't loose or gain any performance in your writes (same as JBOD), and it's redundancy. Since the OS runs mostly off of memory after boot up, RAID1 is great for this. Faster bootup times, and RAID protected. I won't go more into detailed because most people won't use this.

     

    Fragmentation 

    *shudders* Fragmentation: us storage guy's nightmare. You've heard of it. You might know what it is, or you might not. What is it exactly? Depending on the file system, *cough*FAT32*cough*NTFS*cough*, it could be your worst and most frequent enemy (and for those guys keeping score... Yes. I'm digging into Microsoft here. I am root after all).  

    Fragmentation occurs when you have a file, and you change it (grow/shrink). Then add another file and change that one. And you keep changing these files (and/or adding/removing). Each time you change it, depending on the file system, it can create 'gaps' in the disk and it can fragment your data. Remember, when you originally write data to a disk, the first time, it writes the data sequentially (all right next to each other). Sequential is a good thing. It means that mechanical head, that we so love and hate, doesn't have to travel to get to the next bit to that file (you are relying on the rotation of the disk--which is much faster than the head moving). We love sequential. We can pre-cache sequential (FYI: cache is a LOT faster than disk). 

    How do we fix fragmentation? For you windows guys, defrag is you easiest/cheapest solution. It will restructure the data on your drive so that it becomes more sequential. My recommendation is defrag once a month. If you delete something huge, I'd defrag then as well. Remember, if you delete something, that creates a gap. So when you add a new file, you could potentially fragment that new file.

    Warning: never defrag a RAID5 array, this is because it can beat the snot out of your drives and it will be in vain (you want your data fragmented across multiple drives). RAID0 (2-3 drives, no more) or RAID1 you can gain from defraging.

    Recommended Reading:

    http://en.wikipedia.org/wiki/Fragmentation_%28computer%29

    http://www.itworld.com/Comp/3380/nls_unixfrag040929/index.html 

     

    NCQ/Cache

    NCQ** is a simple and yet ingenious thing. Remember that rotating platter and head? The head has to travel across the disk, as well as have the disk rotated. So what if the head had to travel all the way across the disk (because you didn't partition) and then back for the next bit? This is called Seek Time. Seeking is when the head is moving to it's next bit. Seeking is where those milliseconds appear. Bad seek. Bad.

    There is a solution to this, and it's solution is Native Common Queing. Essentially what this bad boy logarithm does is optimize the way it reads/writes data. If it's suppose to go WAY over there to write a bit before coming all the way back to write the next bit, it writes the second bit first, then the first. Here is a picture of it, compliments of wikipedia:

    http://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/NCQ.svg/300px-NCQ.svg.png

    See how it's going out of order? That simple technique cuts down the total read/write time. Beautifully simple, isn't it?

    **NOTE: NCQ'ing is better supported with SATA 3Gb/s than SATA 1.5

    "What's this cache you speak of?"

    Not your memory, mate. Pretty much all hard drives these days have built in cache. Cache can be accessed using multiple threads (as appose to a single mechanical head that has to search for a particular bit) so it's much faster. Usually the larger the drive, the more data you will be hosting, the bigger the cache.  

    So what does the cache do? Well it allows you to do a number of performance related tasks. One is reading ahead. Say you are listening to a song. When it starts out, you are pulling the file directly off of the disk. But over time, you have all the latest information and you get to the point where you are asking the drive periodically for a chunk of the song as it streams to your ears. The drive can be smart enough to know that you will eventually be listening to the entire song so it will go ahead and grab a few seconds ahead of what you are listening to now and put it on the onboard cache. This way the hard drive isn't constantly working for you. And cache can be accessed using multiple threads, as appose to a single thread (aka mechanical head) that the hard drive has.

    Another neato-magneto thing is utilizing the cache for writes. The song example was a READ task, where you are reading off of the hard drive. You can utilize cache for WRITEs as well. This means you can dump a lot of information--more information than what the physical drive can handle at once--and continue on without needing to hear that the drive had committed it to disk. Remember, cache has multiple threads or, if you want to imagine it, 'heads' that are working for it. So it's MUCH faster than a hard drive. Over time, either by some threshold on the cache's capacity or a ftercertain amount of time has past, it will dump it to disk as the disk can accept it. As a User, you don't have to wait on the WRITE to complete because you've already put it in the cache. There is a catch though. The 'over time' part of dumping data from cache to disk opens you up to data corruption. This is because cache is volatile, meaning the moment you loose power you loose data. Very much like your RAM. This leads us to Windows and a feature called Write-Behind Caching. When disabled, it basically skips the built in cache on the drive for WRITEs and only uses it for READs. This gives more stability to your filesystem (in the event of a power failure) and gives you more cache for your READs. It will also improve your overall performance on small WRITEs. For more information on the pros/cons and how to turn it on/off on most Windows versions, I'd recommend this site.

     

    This is where hybrid hard drives can come in, but this is beyond the scope of this post.

    For the most part, cache for desktop consumers won't have a noticeable impact with today's drives. This is because desktops don't generate the kind of IO that would require it. I say today's drives because most drives have 16MB or more. Storage Review found a relatively big jump in performance (30%) moving from 2MB to 8MB cache. See article here.

    A little tidbit: some programs you can actually change the setting so it uses your memory more (not disk cache, but your actual memory). I know in Azureus (bit torrent client) you can up the caching. Remember, cache will always be much faster than disk.

    Recommended Reading:

    http://en.wikipedia.org/wiki/NCQ 

     

    Root's Thoughts 

    "What is my best bet root?"

    This is what you've been waiting for, right? How can this post help you? Here are my words of wisdom, young padawan:

    • Utilize partitioning for your OS and games/applications. Thankfully those two things don't grow over time as much as your other data (like videos/music/etc). 20GB for an OS is plenty and half that for your games.
    • Minimum I always advise two drives, one being dedicated to your OS and the other to your data/games.
    • Always make your first partition used for your high IOPS (games/applications and OS (pagefile) are considered high in the desktop world), the second (or even third) partitions used for everything else.
    • If you have the money/power/space/cooling for an extra drive, utilize RAID1. You might strictly game, but you can still benefit from RAID1 (OS boot up most notably).
    • Remember what your computer is doing mostly. If it's calculations, that would be the CPU. If it's moving data, then striping could benefit you.
    • You are only as fast as your slowest part. Is your drive whining like a banchee? Try defrag first. Still whining? Might want to try RAID1 or RAID0.
    • Find drives that support NCQ, and large built in caches
    • BACK UP. If you aren't backing up, shame on you. Optimization is fine and dandy but if you don't have a backup solution enabled, everything will be in vain if you loose all your data. If you are given the option between RAID and backing up, choose back up. And remember, if the backup solution isn't automated, it isn't a backup solution.
    • Still curious? Here's a good FAQ if you are interested in hard drives and their performance.


    As usual. I'll never turn down a PM. No question is stupid. And, lastly, I'm learning as well. If you have credible information that contradicts what I say, feel free to show me this and I'll keep an open mind. If you would like to discuss this field (or have any questions), feel free to stop in the discussion thread:

    Classroom 101: Hard Drive Optimization - Discussion 

     


    "Oh Gravity, Thou Art A Heartless b***h"

    -Sheldon

    Click to read my stories.
View as RSS news feed in XML

 Home   Forums   Chat   Blogs   Deals   Newsletter   About 

 FAQ   Terms of Use   Privacy Policy   Contact Us 

©2009 Newegg, Inc. All rights reserved.