During development of the ZTB station we had to come up with a solution which would ensure high throughput for device cloning. The most straightforward solution would be to use Linux standard tools to copy all contents of the device. Unfortunately, the cost of moving all data between devices with involvement of user space was too high. Cloning speed was reduced significantly and copy_from/to_user()
functions were to blame. We decided then to clone devices without leaving kernel space.
In userspace block devices are accessed through special files in /dev directory. If only you have permission you may use the standard interface to operate on the device, you may open()
, read()
or write()
the device. If you would like to do similar operations in the kernel space you may want to take a look at how block devices are managed by Linux. Current block device support was introduced in Linux 2.5. It defines block layer to be responsible for handling I/O requests. It also includes mechanisms to manage request queue so that we can achieve better performance by e.g. merging requests relevant to adjacent disk areas. More details on the topic can be found in kernel documentation and if you like more traditional form then do not forget the Linux Device Drivers by Corbet, Rubini and Kroah-Hartman, though a little old may be a better read to grasp the idea.
Our task in this post is to read single portion of data from block device. What we need to do then is: we need to access block device, allocate some buffer to store information read from the device, tell the device what part of it should be read, start the operation and wait for the result. We have several kernel mechanisms involved here to fulfill our task. Manual pages for most of the operations we use can be found in the kernel filesystem documentation.
Let’s take a look at the steps needed for the operation. First we want to access the device so we have to fill struct block_device
with the appropriate info for the device we would like to use. It can be done with lookup_bdev()
, which takes our device pathname as a input:
struct block_device * lookup_bdev (lookup_bdevconst char * pathname);
we access the device descriptor with bdget():
struct block_device *bdget(dev_t dev);
then we may open the device with blkdev_get():
int blkdev_get (blkdev_getstruct block_device * bdev, fmode_t mode, void * holder);
We have to prepare a buffer for reading so we allocate one page for that purpose:
struct page *pg; ... pg = alloc_page(GFP_KERNEL);
Now we are ready to tell the device what we want. We may use bio structure for this. With bio structure we are able to describe I/O operation and submit this to the block layer. Several bio structures may be part of an I/O request. Here we work on the single bio. So let’s allocate place for our bio structure:
struct bio *bio = bio_alloc(GFP_NOIO, 1);
and then we are able to attach memory page that we previously allocated to our operation represented by bio:
bio_add_page(bio, page, size, 0);
we describe the operation on our device, which will start at selected sector (kernel operates at 512 bytes sector level)
bio->bi_bdev = device; bio->bi_sector = sector;
Now we are almost ready, but it would be good if we knew when our operation completes. We have a useful field in the bio structure for this. With bi_end_io we can point to completion function which is called after operation described in the bio is completed. Now we can use kernel completion interface to signal that the operation came to an end. The bio structure has got one more useful field which comes in handy in our task: bi_private field can point to our completion variable so if the operation is finished the completion function is able to recover from the bio description where is the completion variable to signal end of operation. Finally we may submit our bio:
submit_bio(READ | REQ_SYNC, bio);
we want to perform the read operation, and we may wait_for_completion()
to the moment when the operation is finished. If the write operation was to be performed, the flow of actions would be analogous.
By using the bio structure on its own, you can easily access any place on your block device, but expect some performance penalty – we read single page of data and we wait for it to be read. This has to be slow. For more robust access please take a look at md driver on GitHub.