15

I was always wondering how a hard drive finds the first bit of data.

When a hard drive spins up, whatever it reads must be a circular stream of data until the reading head moves to a different position.

But in such a circular stream, how does the drive know where the first bit and the last bit lie, so that it can pass on the data in the right order?

uzumaki
  • 300
  • 2
  • 10
  • BTW (rarely mentioned but salient) There's an **index mark** to indicate the start (and end) of the track. When the disk controller encounters the index a second time, then that indicates that it has read every sector in the track. – sawdust Jun 12 '17 at 04:51

5 Answers5

18

Data is not written as an arbitrary stream of ones and zeros. It is written in sectors. Each sector has the payload of user data, and a header. The header contains error correcting codes, as well as a special sync field that identifies the start of the sector, and the sector number so the drive can know when it has found the start of a sector, and which sector it is.

psusi
  • 7,837
  • 21
  • 25
  • 3
    Some references or links would make this good answer great. :D – cat Jun 11 '17 at 11:36
  • 1
    @cat, I suppose "How computers work" or Peter Norton's "Inside the PC", if either of those are still in print... probably a few new editions since the ones on my bookshelf. – psusi Jun 11 '17 at 15:24
  • This doesn't answer the question. The question is how the drive knows where the header or the sync field starts. – Martin Argerami Jun 12 '17 at 02:08
  • @MartinArgerami -- This does answer the question because it corrects the OP's misconception, and introduces the salient concept of sectors. Drilling down to the level of reading/writing of magnetic domains seems too intense for the OP IMO. If you/he wants more details then see https://superuser.com/questions/427554/is-it-possible-to-detect-the-previous-byte-position-on-a-hard-drive-after-it-has/427611#427611 – sawdust Jun 12 '17 at 03:05
  • @sawdust: I still don't think this answers the question, because it doesn't mention how the drive knows it is at the start of a sector. The answer you quoted explains that hard disks distinguish sectors by means of gaps (which surprises me, because it makes the data storage less efficient), and that would have been an answer to the question here. – Martin Argerami Jun 12 '17 at 04:08
  • @MartinArgerami -- I'm not up-to-date on current HDDs (e.g. perpendicular recording), but with MFM recording the **address mark** consists of a unique magnetic pattern, and the start of a record is **sync** consisting of two bytes of IIRC 0xA5. *"it makes the data storage less efficient"* -- Are you too young to remember consumer discontent over the advertised unformatted capacity of ST-506 HDDs versus the usable, formatted capacity (which predates the decimal versus binary MB/GB numbering complaints)? The difference between those two capacities is for those gaps and ID records. – sawdust Jun 12 '17 at 04:46
  • How can the hard drive tell if it encountered the sync field or just arbitrary user data that *looks* like the sync field (same bit string)? – xjcl Jun 12 '17 at 09:27
  • @user134593: That's an excellent question. Perhaps you could ask a new top-level question asking about how computers distinguish between hard drive sync fields (and Ethernet frame headers, and IP packet headers, etc.) and arbitrary user data in the payload that *looks* like it? – David Cary Jun 12 '17 at 13:06
  • @user134593: The "transparency tradeoffs" section of ["Forming Data Packets"](https://en.wikibooks.org/wiki/Serial_Programming/Forming_Data_Packets) mentions several (incompatible) ways of distinguishing the sync field of the header from arbitrary user data. – David Cary Jun 12 '17 at 13:17
  • @DavidCary My guess would be they just use an escape byte (or however long the sync field is) similar to an escaping backslash in strings. – xjcl Jun 12 '17 at 19:12
  • 1
    @user134593 -- The sync bytes are always after a gap, and are the first bytes of the record. They merely indicate the start of the record. Hence the disk controller never has a conflict or confusion with payload or user data. You seem to think it's a bit stream (just like the OP), but it's magnetic media, which means you cannot just read or write at any point you want. – sawdust Jun 12 '17 at 19:35
  • @MartinArgerami, no.. the question was how does the drive find the right data when it's just a big ring. The answer is thanks to the sync field and sector ID in the header. – psusi Jun 13 '17 at 00:02
  • @user134593, it can tell because it is nearly impossible for user data to match the sync field, have a sector ID that makes sense for the track the head is on, and the error detecting codes pass as well. An escape can not be used because that inserts additional data, for which there is no room. – psusi Jun 13 '17 at 00:06
  • @psusi: as sawdust mentions, it look like the gap is used to detect the start of a sector. Otherwise, you seem to be claiming that on every read the controller loads the whole track in memory, and starts searching for the pattern? And what happens if I replicate that header in user data? The drive fails? – Martin Argerami Jun 13 '17 at 00:48
  • @MartinArgerami, a gap can not be distinguished from any other set of ones and zeros to the head. The controller does not have to preload the entire track; it just has to start reading somewhere and keep going until it finds a valid sector header and if it is the sector it cares about, then it starts buffering that sector in memory. You can't replicate an entire valid sector including the header and error correcting codes and have it all fit in the payload area of a sector. – psusi Jun 13 '17 at 01:04
  • @MartinArgerami *"it look like the gap is used to detect the start of a sector."* -- Wrong, it's usually an address mark (which is a special magnetic pattern). See the possible duplicate answer above. – sawdust Jun 13 '17 at 05:46
  • @psusi *"it just has to start reading somewhere..."* -- Wrong. Reading the magnetic media has to begin in a gap, where there are no flux reversals. FWIW I have first-hand experience writing disk controller firmware. So I'm not guessing as to how this stuff actually works like you other folks. – sawdust Jun 13 '17 at 05:47
  • @sawdust: I didn't even know there was a gap, until you wrote, 10 hours ago: *"The sync bytes are always after a gap, and are the first bytes of the record. They merely indicate the start of the record."* Whatever. I came to this question to learn "How does a hard drive know where the data starts?" One day, several answers, and many comments later, I still don't know the answer. – Martin Argerami Jun 13 '17 at 06:18
  • @sawdust, if there are no reversals, they the head just reads it as a series of ones or zeroes... it doesn't have any way of knowing that is a "blank" any different from any other sequence of ones or zeros. The head starts reading wherever it happens to be when moved over the track. It is up to the controller to look for the sync pattern and recognize the sector header. FWIW, I wrote a floppy disk driver years ago for ReactOS and played around with different ways to low level format the disk. – psusi Jun 15 '17 at 00:55
  • @psusi *"if there are no reversals, they the head just reads it as a series of ones or zeroes"* -- Then obviously you don't understand how magnetic media works, in spite of your floppy experience. The digital ones and zeros are generated by flux reversals. IOW No flux reversals means constant value! Your high-level experience with a floppy controller apparently allowed you to ignore a lot of low-level HW details. I wrote the actual (disk) controller firmware, which is a layer or two closer to the HW than what you ever did. – sawdust Jun 16 '17 at 19:35
  • @sawdust, I suppose they probably use more complex encoding techniques these days.. I know SATA uses an encoding that has values that are special control values and some that are not valid at all by using 40 actual bits to encode each 32 bits of data or a control word, but at least floppies and old school hard disks just used one pole for a one and the other for a zero, and used high frequency AC to "erase" the medium, returning it to a nearly neutral value that could be read as either a very weak 0 or 1 randomly. They wouldn't have used a sync pattern if they could tell when the "gap" ended. – psusi Jun 25 '17 at 00:34
7

Psusi is correct (the data on the disk is structured, and different parts of the computer use different parts of that structure) but doesn't really get at your question.

The drive doesn't really "know" anything. It has low level electronics that can read markers on the disk (generally written at the factory, or by the drive head itself), read data blocks from the disk, or write data blocks to the disk, or tell if a particular spot on the disk is bad or damaged, or that it should move to a particular location on the disk. That's about all it "knows". The reading head doesn't decide to move someplace else by itself, something higher up in the machine tells it to...

quadruplebucky
  • 529
  • 2
  • 7
  • 2
    But the something higher up is still part of the hard drive. (Which is commanded by something outside the hard drive, which is commanded by something outside the computer, which is commanded by something metaphysical, but we aren't talking about any of those things) – user253751 Jun 11 '17 at 12:01
3

It reads it from the disk.

Data on the disk is not only structured (as @psusi says), but also encoded. The encoding ensures that the recorded data cannot be confused for the position markers in the sector headings, so the circular stream can be read until the target position marker is found.

As I understand it, modern hard drives don't quite do that; they read the entire circle into a buffer, keeping track of where each sector is, and use the buffers to send back requested data.

UPDATE:

The magnetic media is a material which has a magnetic field with two key properties: 1) it never changes on its own, and 2) the recording device can change the orientation of the field at any point on the surface. When reading the media, the sensor detects where the field is oriented toward the sensor and where the field is oriented away from the sensor. As the sensor moves across the surface it detects the timings of these polarity transitions; the first layer of decoding is translating these timings into bit values. Due to physically necessary uncertainties in this process, the encoding must not require long stretches of the same polarity; that is, it must be a Run-length limited coding (RLL).

The particulars of hard drive designs are generally trade secrets, but there are essentially two ways to ensure that sector markers never appear in sector content:

  1. Design an RLL that allows special values which will never result from encoding content data. These special values could be used not only for marking sector boundaries but also for error correction or any other secondary purpose.

  2. Use a second layer of encoding that ensures the marker values only appear at the markers. This is a bit like URL encoding to allow special characters to be "hidden" in URLs, but with an additional constraint equivalent to limiting how many characters can be added, so it ends up more like base64 encoding.

So, the read head moves across the surface detecting magnetic polarity changes, the timings of those changes are used to determine the corresponding sequence of bit values (possibly including some exceptional values that don't represent stored data), and that sequence is used to determine which sectors are being read and the content of those sectors. As the content of sectors is determined, the data may be stored in a solid-state buffer and/or stored in a RAM buffer and/or sent back to fulfill a request.

ShadSterling
  • 1,459
  • 1
  • 11
  • 20
  • Is this encoding like the Huffman coding? If someone could make a clear example of how this works on a hardware level like so: "The head reads a circular data stream like 010111010010111010... where every 111 marks the beginning of a sector, and then..." I could then accept the answer. – uzumaki Jun 14 '17 at 22:19
  • Hard drives have to encode abstract data as physical properties of the magnetic media, and the result has to be easily addressable; Huffman coding maps one stream of abstract data to a (usually) shorter stream of abstract data (breaking addressability). They're mostly unrelated. – ShadSterling Jun 15 '17 at 14:19
  • My update didn't add any examples, but if you follow the RLL link it has some. – ShadSterling Jun 15 '17 at 14:21
0

In addition to the other answers, hard disks certainly used to (and may still do) have one platter ("head" in cylinder/head/sector terms) which is reserved for calibration/positioning data, not used at all for user data storage.

Mark K Cowan
  • 698
  • 1
  • 6
  • 13
  • 3
    This isn't something I've heard of, do you have a reference for it? – ShadSterling Jun 11 '17 at 20:31
  • Yea, no.... that's not a thing. – psusi Jun 12 '17 at 00:15
  • You seem to be referring to the servo surface/platter. But that is obsolete technology that I haven't seen since 14" disk packs, which predate the ST-506 HDDs of the original IBM PC-XT. Winchester and modern disk drives use an embedded servo. – sawdust Jun 12 '17 at 02:56
  • Ah OK, I haven't heard of this for quite a long time also, although I put it down to modern disks being black boxes. – Mark K Cowan Jun 12 '17 at 08:58
0

The answer you are looking for has two parts:

1) A hardware controller

2) A file system

Like you said, in a HDD (as opposed to other technologies like SSDs) the actual data is written to round metal plates as concentric circular rings holding a patterned magnetic field. Above the platters that hold this data is the write head which moves around to read and write data, a lot like a vinyl record player. The platters it moves over are attached to an electric motor which controls their rotation.

A hardware controller acts as an interface between the operating system and the hard drive. The controller can read the position of the write head as well as the rotation of the platters and uses this information to decide how to position the head and platters for reading and writing. It translates read and write requests from the operating system into control signals that move the write head and rotate the platters, as well as converts the parallel data coming in from the operating system into a single serial data line. It also splits up this serial line and decides what physical location, or sector, to put each piece in and records this information in a way specified by the file system.

The file system is a specification of how and where to store data. The computer's operating system knows how to interpret this file system and uses this knowledge to adequately communicate with the hardware controller, in this case breaking down the circular rings of data into usable segments called sectors and telling the file system where these sectors are physically located. The file system gives each sector an address, which is just a unique number, and this address gets translated by the hardware controller into a specific platter rotation and read head position to begin reading or writing.

For more information, the following sections in these Wikipedia articles are quite helpful:

See Intro and section 3.1 "Space management" here: https://en.wikipedia.org/wiki/File_system

See section 2.1 "Magnetic Recording" here: https://en.wikipedia.org/wiki/Hard_disk_drive#Magnetic_recording

Salvatore
  • 101
  • 2