Go to the first, previous, next, last section, table of contents.
-
There are various `todo' and `fixme' comments in the code. This is an
obvious place to start for improving kernel code quality.
-
Allow two de/compression work areas: one for compression, the other for
decompression.
-
Once the above works, make the compression work area
allocated/deallocated according to demand. One way of doing this is
just to deallocate the work area as soon as we're finished (unless there
are other processes lining up to use it) and reallocate when we next
need it (which could be almost immediately, unfortunately, e.g. a
process creating a large file). There are other solutions....
Perhaps the best (of the relatively easy) is as follows: have one work
area that is always present, just as at present (0.4.1). If we get a
request for a read while another process is using the main area for
writing (i.e. compressing), then allocate a new work area. In order to
avoid allocating/deallocating in quick succession, the area isn't
deallocated until the write is finished. (Alternatively, the new area
could become the primary one.... This would be more efficient in
some ways (e.g. our contents cache is more useful), but we'd have to use
the one allocation method for both areas rather than
vmalloc()
for the stable one and kmalloc()
for the transient one. I don't
know what the advantages of vmalloc()
are over kmalloc()
,
so I don't know how costly this is.) Note that this `deallocate when
compression finishes' scheme provides some help in the `many
decompressions in succession' case, but not the `many compressions in
succession' case (e.g. process creating large file). So maybe this idea
isn't such a great improvement on that suggested in previous paragraph
after all. Its only advantage is that less memory is spent when the
system is compressing but not decompressing.
Of course, compression in user space would provide an excellent
solution.
-
Free preallocated blocks when we fail to decompress a cluster.
-
Support the SYNC flag. (SYNC is only partially supported in
standard 2.0 kernels, so this isn't a high immediate priority; on
the other hand, it is better-, or maybe even fully-, supported on 2.2.)
-
Support compress-on-idle. Have the kernel maintain a queue of
inodes to compress. When a file's data is accessed (read), move
it to the end of the queue if it is in the queue. When we raise
EXT2_DIRTY_FL
(in ext2_write_file()
, ext2_truncate()
,
ext2_ioctl()
), insert inode into queue if it is not there already.
Have ext2_cleanup_compressed_inode()
remove the inode from the
queue (wenn appropriate).
-
When I (Antoine) was thinking about the ideal compressed file
system, I imagined we could wait a little more before we really
compress file that have been accessed. Since access to compressed
cluster is slower, we could uncompress them and mark the file dirty,
but instead of compressing it again when the inode is put, just link
it into a special directory that would hold all dirty files. Files
in this directory could be compressed again after a certain amount
of time, or when we start to lack free blocks. This is a feature I
liked in tcx. This is no more than a cache where the uncompressed
block would be stored on the disk, and that would persist even after
the machine has been stopped.
-
Get rid of
EXT2_NOCOMPR_FL
(i.e. `chattr +X', the
attribute flag that provides access to raw compressed data and
prevents more than one process to have the file open) and replace
it with GNU's O_NOTRANS
fcntl
flag. (The advantage
of O_NOTRANS
is that (i) it's more standard (but only on
HURD, not Linux) and (ii) other processes can still have the file
open for normal access (though I think that this should be
implemented as a per-file-descriptor flag rather than a
per-file-opening flag).
Linux already has mandatory file locking support, so we can use
that instead where we need it.
-
Better provision for logfiles, where we'd like to compress all but
the last (incomplete) cluster. (If the last cluster is compressed
then we have to uncompress and recompress on every write -- and
remember that logfiles are usually sync'ed after every line.)
This has now been partially provided for with the `none'
algorithm: set the algorithm to `none', then every once in a while
change it to `gzip' and then immediately back to `none'. Changing
the algorithm from `none' to any other algorithm is a special
case: the kernel looks at all clusters that aren't already
compressed (which, for log files, should mean only the last
cluster or two) and tries to compress them with the new algorithm.
-
Support the
EXT2_SECRM_FL
(i.e. secure deletion) flag. (This
isn't supported even in the standard 2.2 kernel, so this isn't a high
priority.)
-
Add some mount options? Anything useful come to mind?
-
Make an e2compr kernel module? The aim would be that people can
insmod e2compr into a kernel even if that kernel already has the
ext2 fs (without e2compr) built in. Useful if you don't have a
choice of the base kernel, as may be the case when upgrading some
Linux distributions. However, I'm not sure how to implement this.
The e2compr patch must modify some core ext2 routines, which, although
I (pjm) don't know much about modules, I think is impossible if the
ext2 filesystem is already compiled into the kernel not as a module.
(Does anyone know?)
-
Support
bmap
.
bmap
returns the block number on the filesystem where a
given <inode, block> pair is stored, information which is presumably
going to be used to access the raw device at that point rather than
go through the filesystem (and e2compr).
I don't think we can implement bmap
directly: it would require
decompressing all clusters that were requested through bmap
calls,
which is undesirable at best, particularly on a read-only
filesystem.
More sensible might be to look at all the callers of bmap
and see
if they can be coerced to go through e2compr for files with
EXT2_COMPRBLK_FL
set.
One might alternatively think about having a `virtual device' for
e2compr clusters. A virtual device would also help with caching
uncompressed clusters, btw. Antoine's objection to creating a
virtual device was that there's no trivial mapping between <inode
*, blockno> pairs and a 32-bit block number on the virtual device.
We could
probably grab the allocation code from any of the filesystems,
though I suggest we could benefit by optimising it for sparse
occupation of the 32-bit block address space. I've done some
preliminary work for this, but we won't see any real work done on
it until the current incarnation of e2compr is working reasonably
well.
-
Until the
bmap
problem is addressed, try to make it impossible to
compress a file that's being used as a swapfile.
-
Allow modification of cluster size even for already compressed
files. (Note that this would involve decompressing then
recompressing the whole file.)
-
Recompress the whole file when the algorithm is changed? We
certainly wouldn't want this to happen if we change the algorithm to
`none'. This functionality is being added to
e2compress
, but I
don't think that the usual kernel behaviour will change.
Go to the first, previous, next, last section, table of contents.