*/
/*
- * When using compression, the client attempts to send chunk
- * aligned reads, but sometimes it can't, and the client will
- * send a read to the server which is not chunk aligned.
+ * Whenever possible, the client handles compression and decompression, but
+ * there are two cases where the server must assist.
*
- * In this case, the server must read the full chunk,
- * decompress it, and provide the requested data to the client.
+ * 1. Unaligned reads. The client attempts to always send chunk aligned reads,
+ * but sometimes it must send an unaligned read to the server.
+ * 2. Writes into existing compressed chunks. If a write does not cover an
+ * entire compressed chunk, we must do a chunk level read-modify-write to
+ * complete the write. We do this on the server.
*
- * The server receives a set of remote niobufs describing IO
- * from the client. Each remote niobuf (rnb) describes a range
- * of data the client wants to do IO to.
+ * Both of these are types of unaligned IO, in that the IO doesn't match up
+ * 100% to compression chunks.
*
- * These are translated to a set of local niobufs on the
- * server, which we then use to do the read. For compression,
- * the server has to read complete chunks on unalinged reads.
+ * In both cases, the server must read the necessary chunks from disk,
+ * decompress them, then do the transfer (either to the client for reads or
+ * from the client on writes). In the case of writes, the server must then
+ * write the complete chunk to disk. (The server will eventually recompress
+ * the data, but this isn't finished yet.)
*
- * So we walk these remote niobufs and identify unaligned read
- * requests (in ofd_preprw_read), then round them to chunk
- * size. The server then reads the chunk rounded read request
- * from storage.
+ * The server receives a set of remote niobufs describing the IO from the
+ * client. Each remote niobuf (rnb) describes a range of data the client
+ * wants to do IO to.
*
- * The local niobufs now contain a set of complete compressed
- * chunks, ie, the raw data from disk. We need to decompress
- * the chunks where the client is doing an unaligned read, but
- * leave the other chunks compressed (because the client will
- * uncompress them).
+ * These are translated to a set of local niobufs on the server, which are used
+ * to do the server side IO. With compression, we must always read or write
+ * complete chunks.
*
- * So, in obd_decompress_read, we use the remote niobuf to
- * identify unaligned reads from the client. We then walk the
- * local niobufs, identify the chunks which match the unaligned
- * reads from the client, and decompress them 'in place'.
- * The decompression uses temporary buffers, but the
- * decompressed data is placed back in the local niobuf.
- * (If the data is uncompressed on disk, we of course do not
- * decompress it. This happens for incompressible data.)
+ * So we walk these remote niobufs and identify unaligned IO requests (in
+ * ofd_preprw_read/write), then round them to chunk size. The server then
+ * reads the necessary data from storage - for reads, this is the entire range;
+ * for writes, this is just the chunks which have unaligned IO.
*
- * Now the local niobuf contains some raw chunks and some
- * chunks which have been decompressed. This is *more* data
- * than the client asked for. Normally, the server local
- * niobuf contains exactly what the client asked for, so the
- * server checksums and sends the entire local niobuf. But
- * because we read complete chunks, the local niobuf contains
- * more data than the client requested.
+ * The local niobufs now contain a set of complete chunks with the raw data
+ * from disk. We need to decompress the chunks for unaligned IO, but leave the
+ * other chunks unmodified. (For write, those chunks were not read from disk,
+ * for reads, they will be decompressed by the client.)
*
- * This means we need to identify the subset of the local
- * niobuf which the client actually wants to read and present
- * that to the client.
+ * So, in obd_compression, we use the remote niobufs to identify unaligned
+ * accesses from the client. We then walk the local niobufs, identify the
+ * chunks which match the unaligned IO from the client, and decompress them
+ * 'in place'.
*
- * In order to do that, we walk the local niobuf and use the
- * remote niobufs (the description of the pages the client
- * needs) and create a special tx niobuf which points to only
- * the pages the client wants (io_lnb_to_tx_lnb). Then we use
- * this tx niobuf for checksum and transfer to the client.
+ * The decompression uses temporary buffers, but the decompressed data is
+ * placed back in the local niobuf. (If the data is uncompressed on disk, we
+ * of course do not decompress it. This happens for incompressible data.)
+ *
+ * Now the local niobuf is ready for transfer - either to be sent to the client
+ * or to be updated by data from the client. For reads, the aligned portion of
+ * the IO contains raw data from disk for the client to decompress. For
+ * writes, the aligned portion is empty (the client will place data there).
+ * For both reads and writes, the portions of the niobuf which correspond to
+ * unaligned IO contain decompressed data.
+ *
+ * However, the local niobuf does not match the range requested by the client -
+ * Because of chunk rounding, it's larger than the client asked for. Normally
+ * the local niobuf contains exactly what was asked for, so we checksum and
+ * transfer the whole thing. In this case, we can't.
+ *
+ * This means we need to identify the subset of the local niobuf where the
+ * client tranfer (read from or write to) will occur and present that to the
+ * client.
+ *
+ * In order to do that, we walk the local niobuf and use the remote niobufs
+ * (the description of the pages the client needs) and create a special tx
+ * niobuf which points to only the pages the client wants (io_lnb_to_tx_lnb).
+ * Then we use this tx niobuf for checksum and transfer to/from the client.
+ *
+ * For reads, we're done. For writes, we then write all of the data out to
+ * disk, including complete chunks for the unaligned areas.
+ *
+ * In the initial version, we write this to disk uncompressed. This is
+ * sufficient for correctness, but not ideal since it decompresses those areas
+ * of the file. The code for re-compression is not working 100% yet. This
+ * will be updated when that code is in and working.
*/
#define DEBUG_SUBSYSTEM S_SEC