Logo

Django and Big Data Part 1 — Primary Keys

Only one blog post a year, geez… let’s try to change that…

Assumptions

You have a Django site and your data is growing, or you want to add data. At some point, the defaults are not going to be enough, but you’re not big enough to hire a DBA, and NoSQL isn’t right for you. What do you do? This series will explore some things you can do as your data grows to improve performance, scalability, and keep things running. Assumptions: You are using Django and the Django ORM, you are using PostgreSQL, and your data is growing to a point you are concerned about performance, disk space, or both.

Django Default Primary Keys

When you create a Django model, by default it adds a primary key. This primary key is a column of class AutoField, and it has the flag “primary_key”. Were you to define it explicitly in code, it would look like:

id = models.AutoField(primary_key=True)

A primary key by definition must be unique. This uniqueness is implemented via an index in PostgreSQL. If there wasn’t an index to enforce uniqueness, the database would have to scan every record to ensure that the value you were inserting was unique. Obviously, that’s not optimal. This implementation takes disk space, and even with the sequence generator, updating the index takes CPU and disk time. When you are looking for ways to reduce disk space, CPU, and disk time, this may be a penalty you don’t have to take.

Natural Keys

You might not need the primary key that Django creates. Your data might have a natural key. A natural key is a piece of data that is already unique that you can use as the primary key instead of an arbitrary auto-generated numeric sequence. If your data has a natural key, it probably doesn’t need the primary key that Django generates, and you can save the disk space and the CPU and disk time you would lose by having it.

Here are some examples of possible natural keys that in certain use cases can act as a unique primary key. However, in all the examples, there are cases you can think of where the example doesn’t make sense as a unique key, so you’ll need to look at your own data and determine what columns make sense for you. There are many more than the following:

  • Social Security Number
  • Timestamp
  • ISBN Number
  • ICAO Airport Code
  • Vehicle VIN Number
  • Another Table’s Primary Key (One-to-one table relationship)
  • A combination of data such as first name and last name

In order to stop using the the default primary key in Django and specify your own, all you have to do is add “primary_key=True” to the field you want to use. This field can be of any type, even a ForeignKey . However, if you add it to a ForeignKey, it often makes sense to change the definition from ForeignKey to OneToOneField). For example:

user = models.OneToOneField(User, primary_key=True)

or:

check_number = models.IntegerField(primary_key=True)

Potential Problems with Natural Keys

There are some database designers that insist that primary keys always be surrogate keys, that is, a column that isn’t directly associated with the data itself, such as an auto-generated sequence like the Django default primary key. The reason they say this because “all data can change, even if you don’t think it will, and it will change in a way that breaks your assumption of uniqueness”. For example, in theory SSNs are unique to an individual, and they have never been recycled. However, depending on your data source, you may have individuals using fake or misappropriated SSNs, the most famous one being 078-05-1120. How do you handle that in your model? Do you reject the duplicate, or is the duplicate the correct one? Only you can answer the question of if a field is appropriate as a natural key.

Another problem is using non-integer primary keys in Django. If you select a char field as primary key, Django will add both a standard index and an index with an operator type of varchar_pattern_ops, which is used for “like” and regular expression queries. You might want this capability, or you might not. If you don’t, and the field wasn’t previously indexed, you lose all the benefits of using that field as the primary key instead of a sequence. And unfortunately, there is no way to turn that off within Django.

Finally, Django doesn’t currently support multi-column (composite) primary keys, even though PostgreSQL does. Django developers understand that there is a need and have been actively looking into it. If your natural key is a composite of columns, it’ll be tough to make that work in Django. You can have unique constraints defined in Django, and you must have a primary key column defined. But you can’t make the unique constraint the primary key yet.

A Practical Example

I’ve been going through the ForecastWatch database looking for areas of performance improvement and disk savings. Over the past two years, the number of forecasts collected and analyzed each day has more than doubled, and we want to grow it even more. But the only way it can grow significantly from it’s currently nearly 1TB size and add a slew of new features cost-effectively is if we ensure it’s a lean, mean, fighting machine.

The ForecastWatch database has been around in its current form since 2005 and I started using Django pre-magic-removal. There are definitely things I would do differently today, and by going through the entire system I found a number of places where auto-generated primary keys were just taking up space.

The best example of this are the calculated forecast error tables. Each forecast in the forecasts table is compared to observations and error values are calculated and stored in “score” tables. There are five score tables, one each for high temperature, low temperature, precipitation, wind, and opacity or sky condition. These are broad tables, with some storing 30 or so calculated error metrics. These metrics are sometimes costly to compute so by storing them in a table, we are able to efficiently calculate aggregated error statistics and display individual forecast error metrics online quickly. For each forecast, there is one, and only one row in each of the score tables.

With forecasts from more than a dozen providers, from most countries in the world, for almost ten years, we now have hundreds of millions of forecasts in the database. Each of the score tables data size is about 20GB. On top of that, there were three indexes on each of the tables, each 7GB on disk:

  • The auto-generated id
  • The forecast id associated with the score
  • The observation id associated with the score

And the Django code looked something like this:

class HighScore(models.Model):
    # Data
    # ... fields for each of the calculated error metrics ...
 
    # Keys
    forecast = models.ForeignKey(Forecast)
    actual = models.ForeignKey(Actual)

Because a primary key isn’t explicitly specified, Django creates one for you. That’s the auto-generated id that’s the first bullet point above. And it’s completely unnecessary because each row has a one-to-one relationship with a forecast, so we can just use the forecast id as the primary key.

We can do that simply by replacing adding “primary_key=True” to the parameters of the forecast ForeignKey, and we can optionally change the field class of the forecast column from ForeignKey to OneToOneField. Using the OneToOneField makes it clearer the nature of the table relationship, and provides some niceties, such as only returning a single object if you query the reverse relationship (getting a score from a forecast). So the new code looks like:

class HighScore(models.Model):
    # Data
    # ... fields for each of the calculated error metrics ...
 
    # Keys
    forecast = models.OneToOneField(Forecast, primary_key=True)
    actual = models.ForeignKey(Actual)

This simple change netted a small but notable insert performance improvement, no decrease in select or update performance, and 35GB of disk space saved. If you are going to have big tables in your application, see if you can improve performance and decrease your disk requirements by looking for alternatives to an auto-generated id field, but don’t force your design into a bad architecture simply to avoid one, only change if it makes sense.


Of Window Managers and Desktop Environments

Re-evaluating My Desktop

I’ve been using Linux as a desktop for the past three or four years. For much of that time I used Ubuntu and GNOME 2. This past fall, a number of things aligned that caused me to spend a lot of time rethinking my desktop.

The first was Ubuntu’s decision to move to Unity. That opened up a window for me, as I wasn’t entirely pleased with my ability to navigate Unity, and I really wasn’t happy with its initial lack of multi-monitor support. The second was that for the first time in my 20+ year programming career, I started having some symptoms of carpal tunnel. The third was that after three years of Linux in a familiar-to-Windows-refugees desktop (GNOME 2) I was ready to take the next step and look for more productive desktops. Because of all this, I decided to step back and learn more about all the options available to me, from distros to desktops.

The first thing that I discovered was that there is a big distinction between Window Managers and Desktop Environments. For someone coming from Microsoft Windows, GNOME, or Unity, this may be an unfamiliar distinction. A Window Manager is the part of the desktop that is concerned with the presentation, placement, management, and manipulation of windows. That’s it. Until I started researching new Linux desktops, I never knew that GNOME 2′s window manager was Metacity, GNOME 3 uses a window manager called Mutter, and Unity uses Compiz for it’s window manager.

It’s More Than Just Windows

But a Linux desktop (or any desktop) is more than just windows. There are lots of parts that combine to make a desktop experience convenient, productive, and enjoyable. Stuff you generally don’t think about until it’s not there. Things like: a login screen, a graphical file manager, a status bar or dock, having a USB thumbdrive auto-mount on insertion, having a program ask for root password or sudo if it needs escalated privileges, printer management, program launchers (so you don’t have to always launch from a terminal), wireless network management, notifications, keyrings (so you don’t constantly have to enter root or your password every time escalated privileges are required), an ability for unprivileged processes to talk to privileged processes so you don’t even have to sudo or run as root (i.e. PolicyKit), screenshot capturing, etc. There are a lot of niceties that improve interaction and productivity or are just conveniences on the desktop that have nothing to do with window management. These are things you rarely think about, until they aren’t there.

On Linux, there are five major “all-in-one” desktops available on the Linux deskop. They are GNOME, KDE, Unity (which uses a lot of GNOME parts for the Desktop Environment), LXDE, and XFCE. All five provide a complete desktop environment and window management ala Microsoft Windows. Each chooses their own solution for many of the desktop “bits”. For example, for graphical file management (and for some, USB automount management), GNOME and Unity use Nautilus, KDE uses Dolphin, XFCE uses Thunar, and LXDE uses PCManFM. When you install a Linux distro with one of these “all-in-ones” you end up with a complete, working desktop that you don’t have to think too much about. Most major distros have package groups or spins that allow you to run any of them in a nice, well packaged, well configured setup. GNOME, KDE, and Unity are fully-featured and fully loaded, while LXDE and XFCE are a little less so (by design).

I evaluated all five of the “all-in-ones”. These are my opinions, and quick thoughts based on several weeks of using each one as my primary desktop. For me, it seems GNOME 3 and Unity are trying to move to a “post PC” desktop, where touch interaction is the primary interface to the computer. If I were playing casual games or surfing the web all day clicking on reddit links, it would be a great interface. With GNOME 3, it was a hassle even to get a weather icon in the status bar, like I could effortlessly in GNOME 2. More importantly, for me, who has a lot of open windows and spends a lot of time typing text, it’s not the direction I am heading. My laptop and desktop and my profession is definitely not “post PC” nor will it be for the foreseeable future. I did really like GNOME 3′s Window-key search, and Unity seemed to have a number of useful keyboard shortcuts out of the box.

LXDE and XFCE were fast and responsive, but are focused on providing a usable environment on even the most constrained devices (like netbooks). Coming from GNOME 2, they were very familiar, and aren’t trying to break new interface ground. KDE was the most intriguing and the KDE team seems able to provide a usable interface for netbooks and tablets but still innovate on the desktop. No other all-in-one is working as much on trying to improve desktop productivity, and KDE’s semantic desktop and workspace concepts are areas to watch.

Keyboard Control

After trying out the “big 5″ I still wasn’t satisfied. All the all-in-ones could be configured to make me reasonably productive. But throughout this process I became more aware of how I use the desktop. Generally, I have a lot of workspaces, either with some number of applications on them tiled nicely, or I have one big window. For example, one of my workspaces is just a web browser. Another workspace is three terminal windows… one the command line, one an editor session, and one at a database prompt. I found it was easier to group related windows that I tended to work on together, well, together. I also wanted to create keyboard shortcuts for all the window management I wanted to do, as I believed some of my wrist pain was caused by moving from the keyboard to the mouse frequently, as I focused different windows or moved them as needed. After some investigation I discovered tiling window managers. Most of them gave me the ability to completely control my windows, window layout, workspace views, and focus with the keyboard. Tiling window managers also made easy what I always did with windows: either tiled or full-screen. I never had partially overlapping screens, except for the occasional floater (like calculator).

Terrified of the Blank Screen

I tried three tiling window managers: awesome, xmonad, and qtile. After trying them all, I saw the potential, but they all initially scared me away. Qtile isn’t a very popular window manager, but it’s how I learned about tiling window managers in the first place. One Qtile user/developer presented an awesome lightning talk on it. Watch it here on YouTube. Even if you have no interest in Qtile or tiling window managers, it’s a great, funny 5 minute talk.

I wanted to love Qtile, as it is written in Python, but I had trouble getting it running. None of the major distros had pre-built packages for it. Awesome and xmonad were available as pre-built packages on all the major distros (save awesome on Fedora for some odd reason, but it is available in a third-party repo). Awesome was very easy to get running, and came pre-configured with a task bar. When xmonad started I was terrified I had done something wrong… all there was was a blank screen (that’s normal).

After the initial shock, I started using each one for a time. Due simply to the available documentation and user base, I gravitated to awesome and xmonad. This is when I discovered the difference between a window manager and a desktop environment. I started to try to do “desktopy” things: how do I configure wireless on my laptop? How do I get a status bar in xmonad? How to I launch programs outside of a terminal window? How do I set up my printers? All the things that had come “standard” with my GNOME desktop now needed to be researched, installed, and configured. I learned so much about how a modern Linux desktop is put together, and that was invaluable: PolicyKit, DBUS, session management… all fascinating. I was able to get a productive desktop in both awesome and xmonad. I leaned heavily on the great documentation that users of ArchLinux have built. If you ever want to learn how a Linux system is put together out of its parts, they do a wonderful job. Their wiki is comprehensive, well laid out, and informative even if you don’t choose Arch as your distro of choice (I didn’t).

But at the end of the day, even after all that, I wasn’t happy. I loved the keyboard controllable interface and tiling. But there were still little cracks here and there… functionality that I had gotten used to in GNOME that I couldn’t get configured exactly right. And while I enjoyed the learning experience, for me it wasn’t about the ultimate configurability so much as having a keyboard-controllable tiling window manager with a full-featured desktop environment.

Best of Both Worlds

Then I discovered it was possible to get the productivity and keyboard control of a tiling window manager, with the desktop glue of a desktop environment. What I ended up doing was installing the “xmonad-gnome” package in Fedora. Just “yum install xmonad-gnome”. It uses the “fallback” interface within GNOME 3 (essentially a GNOME 2 status bar) which means I can even get my weather status icon! I am sure other distros have similar packages that make getting up and running fairly painless. If not, both xmonad and awesome have easy-to-follow documentation on getting a tiling window manager working seemlessly in GNOME. Xmonad’s documentation is here, while ArchLinux provides great awesome window manager documentation here.

I’ve found great productivity benefits using a tiling window manager, and having the GNOME desktop environment in the background makes my Linux desktop easy to use. I’ve spent some time configuring Xmonad to my liking (a topic for another post), but I don’t have to worry about manually setting up the entire desktop environment. Overall, I use the mouse less, and my productivity has increased. If you are interested in trying a tiling window manager, but were afraid of the lack of a built-in desktop environment, try xmonad-gnome in Fedora, or similar.


Wind Observation Calculations in FORTRAN and Python

The National Climatic Data Center (NCDC) is the government organization in charge of preserving and archiving weather observations in the United States. One of the most popular datasets they provide is the Local Climatic Data (LCD) product. It contains daily summaries and raw hourly reports from about 1,600 stations. The daily summary includes high temperature, low temperature, amount of precipitation, atmospheric pressure, and wind summaries. An example can be found here.

The daily summary has several wind observations. There are maximum five-second and two-minute speeds, and the average wind direction during those gusts. And then there are three aggregate observations: “Resultant wind speed”, “Resultant wind direction”, and “Average wind speed”. What do these observations mean, and what is the difference between “resultant” and “average”?

The average wind speed is a simple average of wind velocity, regardless of direction, over the day. If the wind blew from the north at 10 knots for half the day, and from the south at 20 knots for the other half, the average wind speed that day would be 15 knots.

Resultant wind speed and resultant wind direction are the vector average (the resultant vector) wind speed and direction. Imagine the wind observation as a vector starting from the origin (the observing station) pointing in the direction the wind is originating from and with a length equal to the wind speed. If you take all those vectors and add them up, the resulting direction of the vector would be the resultant wind direction, and the magnitude of the resulting vector divided by the number of vectors is the resultant wind speed.

Adding all of the hourly wind vectors and averaging them results in the values expressed in the daily resultant wind speed and resultant wind direction observations. That is how the NCDC calculates those values, and they do it in FORTRAN. I am going to walk through the specific code that generates those daily averages from hourly data. I’ll also show the same code in Python. Finally, using the complex math library in the Python standard library, I’ll show how the vector average can be calculated with far less code. The full FORTRAN example can be downloaded here, while the Python source can be found here.

The following FORTRAN code was supplied by the NCDC, and is the actual code they use to compute daily resultant wind direction and speed. The Python code was created by me to exactly match the FORTRAN code, to compare and contrast the differences between FORTRAN and Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
      SUBROUTINE R_WIND (N_RWD, RRDD, RWND, RWDIR, RWSPD)
 
      IMPLICIT NONE
 
      REAL*8 RRDD(31), RWND(31), R_DIR, R_SPD, TOTRX, TOTRY
 
      INTEGER X, WI1, WI2, N_RWD
 
      CHARACTER*2 RWDIR
      CHARACTER*5 RWSPD, WC1
 
      TOTRX = 0.0
      TOTRY = 0.0

The subroutine R_WIND takes three parameters, and returns two. N_RWD is the number of hourly wind observations. RRDD is the array of wind directions, while RWND is the array of wind speeds. The wind direction angles passed in are degrees plus 180 converted into radians.

The subroutine returns two results, RWDIR, which is the resultant wind direction, and RWSPD, which is the resultant wind speed. These values are returned as character strings.

The Python equivalent is much shorter in code, because in Python you don’t need to explicitly declare variable types. You do have to import the math library, since the sine and cosine functions that will be used are not implicitly available, as with FORTRAN. There are also no explicitly declared output variables. Because of this dynamic nature, documentation is even more important. The text between triple-quotes immediately following the function definition is called a “docstring.” A docstring is used in Python to document the code, and is used by documentation generators like Sphinx to build web-based documentation for your code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import math
 
def r_wind(n_rwd, rrdd, rwnd):
    """
    Takes an array of houry wind directions and speeds and returns the
    resultant average wind speed and direction, or None for speed and direction
    if there are fewer than 16 hourly observations.
 
    n_rwd -- number of hourly wind observations
    rrdd  -- array of wind directions plus 180 degrees, in radians
    rwnd  -- array of wind speeds
 
    Returns a tuple of resultant wind direction and wind speed.  Both are
    returned as strings.  Wind direction is in tens of degrees.
    """"
    rwdir = rwspd = None
    totrx = 0.0
    totry = 0.0

Once the variables have been declared, the subroutine then checks to see if there are at least 16 hourly observations. If there are not at least 16 hourly observations out of 24 (or two-thirds), it won’t try to calculate a daily resultant wind vector. Then, it loops through the direction and speed arrays, converting into Cartesian coordinates. Converting polar (angle and magnitude, in this case direction and speed) coordinates into Cartesian coordinates to sum them up, then average, is the normal way to add vectors.

15
16
17
18
19
20
21
      IF (N_RWD .GT. 15) THEN
         X = 1
         DO WHILE (X .LE. N_RWD)
            TOTRX = (RWND(X)*(DSIN(RRDD(X)))) + TOTRX  ! TOT X & Y COOR.
            TOTRY = (RWND(X)*(DCOS(RRDD(X)))) + TOTRY
            X = X + 1
         END DO

There are a couple of unique things happening in this code. First, adding 180 degrees to the wind directions prior to passing in the array. Second, using sine for the X coordinate conversion and cosine for the Y coordinate conversion is the opposite of the general way to convert from polar to Cartesian coordinates (see here). The reason this is done, even though it isn’t required in this subroutine, is to create a Cartesian plane that matches a map. When wind is measured at 0 degrees, it means it is coming out of the north. The visualized vector would be an arrow pointing down (the negative y-axis in a standard Cartesian plane). Additionally, the angle would be measured clockwise (a 90 degree wind would be coming from the east, and thus a vector pointing left along the negative x-axis). Using the standard Cartesian coordinate conversion, the zero angle vector would point right along the positive x-axis, with angles measured counter-clockwise.

20
21
22
23
24
25
    if n_rwd > 15:
        x = 0
        while x < n_rwd:
            totrx = (rwnd[x]*(math.sin(rrdd[x]))) + totrx # tot x & y coor.
            totry = (rwnd[x]*(math.cos(rrdd[x]))) + totry
            x = x + 1

The Python version is quite similar. The biggest difference is that Python arrays are zero indexed, while FORTRAN has arrays indexed starting at one. Another difference is that Python arrays use brackets to indicate the index, with parentheses reserved for function calls. Also, no “end while” statement is required, because in Python, indentation has syntactical meaning. Specficially, anything indented past the while statement are considered part of the while block. Once code returns to the same indentation as the while, it is not longer part of the while block. Finally, other differences are that in Python, parentheses are not required around conditional expressions, and comments in Python are indicated with a hash character rather than a bang.

The next section of the FORTRAN code finds the angle of the resultant x,y coordinates and averages the resultant length (which is speed).

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
         R_DIR = DATAN(TOTRX/TOTRY) ! AVG R_WIND DIR
         R_DIR = R_DIR / (3.14156/180)   ! convert radians back to degrees
 
         IF ((TOTRX*TOTRY) .LT. 0) THEN
            IF (TOTRX .LT. 0) THEN
               R_DIR = 180 + R_DIR
            ELSE
               R_DIR = 360 + R_DIR
            END IF
         ELSE
            IF (TOTRX .GT. 0) R_DIR = 180 + R_DIR
         END IF
 
         R_SPD = (((TOTRX**2) + (TOTRY**2)) / N_RWD**2)**.5
 
         IF (R_DIR .EQ. 0 .AND. R_SPD .NE. 0) R_DIR = 360
         IF (R_SPD .EQ. 0) R_DIR = 0

Lines 23 and 24 take the arctangent of the x,y coordinate to compute the angle to convert back into polar coordinates. Line 27-35 convert the angle to the original solution space, based on the quadrant the resultant x,y coordinate resides in. Line 37 is simply the distance between the origin and the averaged x,y coordinate, which is the vector length of the averaged resultant vector, which is resultant wind speed. This is calculated by realizing that x and y are just legs of a right triangle, with hypotenuse being the distance between the origin and the point. Line 39 and 40 are just clean up. The subroutine will return an angle of 360 for a north wind, and zero only if there is resultant wind. Line 40 is required because the DATAN function will return NaN (Not A Number) if x and y both sum to zero, which is the only time the speed would be zero.

The Python version is quite similar. The only major difference is that there needs to be a check that totry isn’t zero, as that would result in a ZeroDivisionError Python exception. When a division by zero is encountered in Python, it will interrupt code flow and throw an exception. FORTRAN on the other hand returns the discrete value NaN which is then passed to the DATAN function, which knows how to handle NaN. There actually is a NaN in Python, but Python chooses to throw an exception rather than return NaN on a division by zero.

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
        if totry == 0.0:
            r_dir = 0
        else:
            r_dir = math.atan(totrx/totry) # avg r_wind dir
        r_dir = r_dir / (3.14156/180) # convert radians back to degrees
 
        if (totrx*totry) < 0:
            if totrx < 0:
                r_dir = 180 + r_dir
            else:
                r_dir = 360 + r_dir
        else:
            if totrx > 0:
                r_dir = 180 + r_dir
 
        r_spd = ((totrx**2 + totry**2) / n_rwd**2)**.5
 
        if r_dir == 0 and r_spd != 0:
            r_dir = 360
 
        if r_spd == 0:
            r_dir = 0

The final section of the subroutine just rounds and formats. It rounds the degrees to tens of degrees, and returns in tens of degrees. Thus, 260 degrees would be returned as 26. Speed is returned as tenths of a knot. The values are returned as strings, as they will be ending up in a CSV file, not used in further processing.

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
         WRITE (WC1, FMT='(F4.0)') R_DIR
 
         READ (WC1, 10) WI1, WI2
 10      FORMAT (I2, I1)
 
         IF (WI2 .GT. 4) THEN
            WI1 = WI1 + 1
         END IF
 
         WRITE (RWDIR, FMT='(I2)') WI1
         WRITE (RWSPD, FMT='(F5.1)') R_SPD
      END IF
 
      RETURN
      END

The Python version is noticeably different as formatting and substring extraction is quite different between FORTRAN and Python. Python uses C-style format strings. The FORTRAN code is also using CHARACTER arrays rather than STRING arrays (perhaps when this was written the FORTRAN dialect used did not support STRING), which doesn’t support substring notation. Python strings are character arrays and provide all the same functionality as other arrays, including sub-array indexing, which is used here. Finally, in Python don’t declare the return variables in the function declaration, you return them explicitly, and anonymously, which gives the programmer a lot of power, but also adds to the responsibility of the programmer to document what is returned in docstrings.

58
59
60
61
62
63
64
65
66
67
68
        wc1 = '%#4.0f' % r_dir
        wi1 = int(wc1[0:2])
        wi2 = int(wc1[2:3])
 
        if wi2 > 4:
            wi1 = wi1 + 1
 
        rwdir = '%2i' % wi1
        rwspd = '%#5.1f' % r_spd
 
    return rwdir, rwspd

That’s it. Of course, if we were to rewrite this function today in Python, we wouldn’t need to do all of this calculation in our code, but we can instead rely on higher level functions in the standard library to help us out. This is probably also true of FORTRAN today, but I’ve not actively written FORTRAN in many years, so don’t know.

Python has a complex math library, called cmath that we can import in order to give us the functionality we need. We can convert polar coordinates into a Cartesian complex number, add up all the complex numbers and average them, then convert the complex number back to polar coordinates. This will accomplish the task of the original function, which is to return the resultant wind vector angle and speed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def r_wind_new(n_rwd, rrdd, rwnd):
    wind_vector_sum = None
 
    if n_rwd > 15:
        for i in range(0, n_rwd):
            wind_polar = cmath.rect(rwnd[i], rrdd[i] - math.pi)
            if wind_vector_sum is None:
                wind_vector_sum = wind_polar
            else:
                wind_vector_sum += wind_polar
 
        r, phi = cmath.polar(wind_vector_sum / n_rwd)
 
        rwdir = int(round(int(round(math.degrees(phi) % 360)) / 10.0))
        rwspd = int(round(r*10))/10.0
    else:
        return None, None
 
    return '%2i' % rwdir, '%#5.1f' % rwspd

This new Python function, r_wind_new is a drop-in replacement for the original r_wind function. It uses cmath.rect() to convert a vector angle and length to a Cartesian complex number. These complex numbers are added together and then averaged. Then, cmath.polar() is used to convert the complex number into an angle and length, which is the resultant wind vector.


Hello world!

I am improving the Intellovations website so that I can better communicate with my customers and interested folks.