This page describes the algorithms used by my RSCW program. Furthermore, some advantages and disadvantages of these algorithms are mentioned.
The goal of this page is not just to inform the curious, but also to provide a starting point for discussions about better algorithms, improvement to the present algorithms, extension of the algorithms to non-machine-sent code, etc. Comments are most welcome!
RSCW (at the moment) only aims to decode morse code that has perfect timing. Hand-sent morse may not have a very rigid timing, and still be easily decoded by the human ear and brain, but such code may be a lot harder to decode automatically. So practically, RSCW is mainly meant for machine-generated code, like the telemetry transmitted by satellites. (Actually, it is a trade-off: by making the algorithms expect perfect timing, we can dig weaker signals out of the noise than with an algorithm that also accepts sloppily sent code.)
Just to be sure, this is the timing specification for morse code:
So the duration of a dot is the obvious unit of time.
From now on, I'll use the word bit for whatever is sent during one unit of time; a 1 indicates that the transmitter is on, a 0 indicates off. So the character 'R' (dot dash dot) would be sent as seven bits, namely 1011101 .
However, for the purpose of decoding morse by computer, this is not a very practical representation. If we receive stream of bits like 1011101 we cannot yet decide that this is actually an 'R': it could be that the next two bits would be 01, making the character an 'L'. This is solved easily by considering the inter-character space as part of the character itself. So we say that 'R' is 1011101000, while 'L' is 101110101000. Consistently, we can express the space between words as the bits 0000. (Technically, the code is prefix-free.)
Now take a look at how this code for an individual character is composed. For every dot, we have the sequence 10; for every dash, 1110; and at the end, we attach 00 (which, together with the 0 at the end of the last dot or dash, completes the group of three 0s that is the inter-character spacing). Consequently, every character is represented by an even number of bits. Furthermore, if we consider the bits in groups of two, called a dibit, we see that only the sequences 10, 11, and 00 occur; the sequence 01 never occurs! So if we number the bits starting from one, the odd-numbered bits are more likely to be 1 than the even-numbered bits. This phenomenon can be exploited to regenerate a clock signal, as we shall see below.
The below picture is a block diagram of what happens inside RSCW:
In black, the path of the signal itself is shown: on the left, it enters
as an audio signal (sampled with 8 bits per sample, at 8000 samples per second),
and on the right the algorithm outputs the decoded message.
The blue, red and green parts deal with other issues, namely finding the
carrier frequency, finding the bit clock and choosing a threshold.
This consists of the following steps:
Remarks:
One possible algorithm for this would work as follows. (Note: this description (and the implementation in RSCW) assumes we already know which received bits are odd-numbered, so we can do the cross-correlation on dibits instead of individual bits.)
However, it is not necessary to keep all messages until the end.
To see this, consider the following example.
Suppose after receiving 8 bits we have the following three messages stored
(and others which are not relevant for this example):
10101000 (this is dot-dot-dot, i.e., an 'S') with cross-correlation 3.4
11101000 (this is dash-dot, i.e., an 'N') with cross-correlation 4.1
10111000 (this is dot-dash, i.e., an 'A') with cross-correlation 2.3
The above three messages each represent a morse character complete with its
inter-character space.
Therefore, each of these can be extended with the same dibits.
So if we after receiving the rest of the message find an extension for
the first of these messages which results in a high cross-correlation,
than that same extension also fits after the second and the third message.
And since the extension contributes equally to the cross-correlation of
whichever message it is attached to, we can see that adding this particular
extension will always give the highes total cross-correlation to the second
of these three, since that one now already has the highest cross-correlation.
In other words: the first and the third of these messages will never have
the highest cross-correlation, so we can just forget them.
(Readers who are familiar with the Viterbi decoder for convolutional codes
will recognise this phenomenon.)
The essential point here is that messages which end in an inter-character
space are equal in the sense that they can all accept the same extensions,
and therefore we only need to keep the best of them.
The above observation helps tremendously to reduce the number of messages considered, making the algorithm actually feasible. However, it has another nice consequence: in practice, it turns out that after a while all remaining messages have the same initial set of dibits. At that point, we can be sure those initial dibits will never change anymore: we have decoded them as the most likely bits, and can output them (or rather the character they represent). So we no longer need to wait until the end of the transmission before we can output at least part of the received message. (Note: theoretically, it is possible that this does not happen, i.e., that several messages with different initial dibits remain "alive" forever. The software needs to handle this case correctly.)
In fact, the idea of removing messages because we can be sure they won't win anymore, can be taken a step further: rather than only looking at messages that have just ended in an inter-character space, we can look at all messages, and remove them if for every possible extension of that message, another message is still there which can also accept that extension and already now has a higher cross-correlation. This again reduces the number of messages to be kept for consideration (thus reducing the computational effort), and also makes the algorithm decide (i.e., leave only messages with the same initial dibits) earlier.
A final note: if one knows even more about the message to be received (e.g., that it consists of sets of 3 letters followed by 2 digits) one could in principle use that knowledge to make the cross-correlator also reject messages that don't fit this pattern. In principle, this improves the decoder's ability to dig such signals out of the noise; however, it has not (yet) been implemented in RSCW.
RSCW uses a rather simple algorithm for this: it performs an FFT (Fast Fourier Transform) on 1-second segments of the incoming signal, and chooses the frequency at which the power is largest (within a range of frequencies specified by the user). It does this twice a second. Linear interpolation is used to estimate the frequency between two successive FFT results.
This algorithm works well as long as there is only one significant narrow-band signal in the frequency range; otherwise, it may easily jump between the several peaks in the spectrum, rendering the decoder useless.
For the satellite signals, a "tracking" mode has been implemented: this tries to track a slowly drifting frequency (due to Doppler shift) by adapting the acceptable range of frequencies. This works fine once it has found the signal, but initially acquiring the signal is a problem unless the signal is already there when the program starts.
Clearly, this is an area where improvements can still be made.
First, a bit of theory.
Consider a stream of 0s and 1s representing several characters of morse code, as discussed
earlier on this page.
Now start at an odd-numbered bit, and perform the following calculation:
take this bit's value, subtract the values of the bit just before and the bit just after this bit,
add the values of the bits two places before and two places after this one, and so on.
In other words: add the values of bits that are an even number of positions away,
and subtract those of the bits that are an odd number of positions away.
Since the original bit was odd-numbered, all the added bits are also odd-numbered,
while the subtracted bits are even-numbered.
As noted before, odd-numbered bits are more likely to be 1 than the even-numbered bits;
as a consequence, the above calculation will give a positive outcome.
Conversely, if we had started the procedure with an even numbered bit, the outcome
would have been negative.
Thus, we can use this calculation to synchronize a stream of bits, in the sense of
deciding whether a certain bit is odd or even.
Unfortunately, the above is just theory which we can't apply yet, since we don't
have the bits yet.
However, we can apply the same calculation to the stream of signal magnitudes
provided by step 4: start with the magnitude at some point in time,
add the magnitudes at points in time that are an even number of bit times earlier or later,
and subtract the magnitudes at times that are an odd number of bit times earlier or later.
If our initial time was right in the middle of an odd-numbered bit, the outcome would be positive,
while if the initial time was right in the middle of an even-numbered bit, the outcome would be negative.
And, as it turns out, if the initial time is somewhere in between, the outcome is also in between.
Thus, doing this calculation at every point in time yields a signal that has a peak in the
middle of the odd-numbered bits, and a minimum in the middle of the even-numbered bits.
This signal (based on a calculation about 10 bit times into past and future)
is plotted in yellow by RSCW.
In practice, the signal derived as above is somewhat noisy. In order to estimate the time of the maxima and minima more accurately and reliably, RSCW fits a sine to the signal, and then uses the maxima and minima of that sine to choose the sampling instants for step 5.
This algorithm obviously fails if the difference between odd and even bits is not honored. Unfortunately, at least one amateur radio station transmitting machine-generated CW (the beacon DK0WCY on 10.144 MHz) seems to use an inter-character spacing of 4 instead of 3 bits, and can therefore not be decoded reliably with RSCW. Obviously, finding an algorithm without this limitation would be a useful improvement.
In RSCW, the following algorithm is used. Given a set of samples (for 5 bits before and after the bit for which we're trying to set the threshold), choose the threshold such that the average distance between the threshold and samples above it, is equal to the average distance between the threshold and samples below it. If there are equally many samples above and below, then is just equivalent to simply taking the mean of the samples. But if the samples are unequally distributed (e.g., during reception of a character that contains a lot of dashes), this algorithm prevents a bias due to the unequal distribution. Actually, the threshold as defined above is not unique: there are typically several values which satisfy the criterion; in that case, the average of the highest and the lowest possible such value is taken. Furthermore, the 7 samples closest in time to the bit of interest are entered twice into the calculation, to make the resulting threshold behave more smoothly in the presence of large signal strength variation.
The resulting threshold is shown as a red line in the graphs.