It is essentially a Beeb/Elk CSW block decoder, capable of both listing the blocks contained within a CSW, and if requested, extracting them into a directory. (You may then manually concatenate these blocks into complete CFS files, if such activities bring you pleasure).
It has a trick up its sleeve -- every byte decoded from the CSW is stored along with the sample number in the source audio at which the byte was found (look for the numbers in square brackets). Consequently it is very useful for accurately pinpointing the locations of interesting features in the source audio file. Most relevantly for my purposes, it allows me quickly to find the precise piece of audio that has led to a particular error in the CSW.
It currently only correctly handles bog-standard 8N1 Beeb-style CFS blocks.
I've attached a ZIP of the PHP, but here is the "readme" (from the comments at the top of the source code):
Code: Select all
WHAT?
-----
This software will load and inspect a CSW file, and attempt to decode its blocks.
Currently only MOS-standard 8N1 BBC Micro/Electron-style blocks are supported.
HOW?
----
By default, the software will list the 8N1 blocks in a CSW file:
$ php -f cswblks.php <CSW file>
If there is a particular block that is of interest (usually because it
contains errors), you can request verbose details of that particular block
based on its block ID (which will have been displayed on the prior run):
$ php -f cswblks.php +b <block ID> +v +e <CSW file>
(+v provides a verbose block listing; +e provides a verbose error listing.)
One other useful trick that this tool can perform is to extract all of the
8N1 blocks from a CSW into a directory, as individual files:
$ php -f cswblks.php +x <output dir> <CSW file>
At this time, this software will automatically create the directory if it
does not exist, and it has no qualms about overwriting existing files, so be
careful. This behaviour can be altered easily (change "TRUE, TRUE" to
"FALSE, FALSE" in the call to save_blocks), but these choices are not exposed
via the command line right now.
Currently, this software provides no way to extract whole CFS files rather than
individual blocks, but the blocks it does write can be manually concatenated into
complete files fairly easily. On Unixalikes a shell command such as the following
will probably do the job, once this software has extracted the blocks to,
say, "/tmp/src", assuming you want the files in "/tmp/dst":
SRC="/tmp/src" ; DST="/tmp/dst"
for N in `ls "${SRC}" | cut -d _ -f 3-4 | sort | uniq` ; do
cat "${SRC}/"*${N}* >> "${DST}/`echo ${N} | cut -d _ -f 1`" ;
done
WHY?
----
This software was written for testing CSWs produced by Quadbike. I was prevously
using beebjit in an ad-hoc way for this purpose, but decided I needed something
better.
It is expected that this software will be useful for anyone else who
is unfortunate enough to be developing software to convert an audio
file into CSW data.
From this perspective, it offers two key innovations:
i) Every byte decoded from the CSW is stored along with the sample number
within the CSW where that byte originated. As such, it is very useful
for determining whereabouts in the original audio file certain features
lie. Header field locations, data locations, CRC locations and any error
locations are all displayed to the user, in terms of the number of samples
into the source audio file that was used to generate the CSW.
ii) An error count and block count is provided at the end of the block decoding
process. In my adventures with Quadbike, I have found that making a change to
an audio-to-CSW algorithm often improves transcription of some tapes, but
only at the expense of other ones. By having a large corpus of test data,
and by summing errors over this entire corpus, it should be possible to
measure scientifically the overall efficacy of any change to a CSW encoding
algorithm (hopefully including my own).
Unlike beebjit's CSW error reporting, this software only counts errors
that occur within blocks -- so errors caused by transients on the tape
during silent sections or leader sections will not contribute to the
error count. It also reports the sample number of stop bit errors, which
beebjit does not at this time.
Additionally, beebjit's CSW-loading heuristic doesn't play nicely with Quadbike;
it was intended for dealing with output from CSW.exe, which measures pulse
lengths from a tape. Quadbike, however, artificially synthesises pairs of
pulses based on frequency data, so using a hard threshold between 1-pulses
and 0-pulses, as e.g. b-em currently does, works much better for Quadbike's
CSW output.