Modern Python Cookbook
上QQ阅读APP看书,第一时间看更新

Building complex strings with f-strings

Creating complex strings is, in many ways, the polar opposite of parsing a complex string. We generally find that we use a template with substitution rules to put data into a more complex format.

Getting ready

Let's say we have pieces of data that we need to turn into a nicely formatted message. We might have data that includes the following:

>>> id = "IAD"
>>> location = "Dulles Intl Airport"
>>> max_temp = 32
>>> min_temp = 13
>>> precipitation = 0.4

And we'd like a line that looks like this:

IAD : Dulles Intl Airport : 32 / 13 / 0.40

How to do it...

  1. Create an f-string from the result, replacing all of the data items with {} placeholders. Inside each placeholder, put a variable name (or an expression.) Note that the string uses the prefix of f'. The f prefix creates a sophisticated string object where values are interpolated into the template when the string is used:
    f'{id} : {location} : {max_temp} / {min_temp} / {precipitation}'
    
  2. For each name or expression, an optional :data type can be appended to the names in the template string. The basic data type codes are:
    • s for string
    • d for decimal number
    • f for floating-point number

      It would look like this:

      f'{id:s}  : {location:s} : {max_temp:d} / {min_temp:d} / {precipitation:f}'
      
  3. Add length information where required. Length is not always required, and in some cases, it's not even desirable. In this example, though, the length information ensures that each message has a consistent format. For strings and decimal numbers, prefix the format with the length like this: 19s or 3d. For floating-point numbers, use a two-part prefix like 5.2f to specify the total length of five characters, with two to the right of the decimal point. Here's the whole format:
    >>> f'{id:3d}  : {location:19s} : {max_temp:3d} / {min_temp:3d} / {precipitation:5.2f}'
    'IAD  : Dulles Intl Airport :   32 /  13 /  0.40'
    

How it works...

f-strings can do a lot of relatively sophisticated string assembly by interpolating data into a template. There are a number of conversions available.

We've seen three of the formatting conversions—s, d, f—but there are many others. Details can be found in the Formatted string literals section of the Python Standard Library: https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals.

Here are some of the format conversions we might use:

  • b is for binary, base 2.
  • c is for Unicode character. The value must be a number, which is converted into a character. Often, we use hexadecimal numbers for these characters, so you might want to try values such as 0x2661 through 0x2666 to see interesting Unicode glyphs.
  • d is for decimal numbers.
  • E and e are for scientific notations. 6.626E-34 or 6.626e-34, depending on which E or e character is used.
  • F and f are for floating-point. For not a number, the f format shows lowercase nan; the F format shows uppercase NAN.
  • G and g are for general use. This switches automatically between E and F (or e and f) to keep the output in the given sized field. For a format of 20.5G, up to 20-digit numbers will be displayed using F formatting. Larger numbers will use E formatting.
  • n is for locale-specific decimal numbers. This will insert , or . characters, depending on the current locale settings. The default locale may not have 1,000 separators defined. For more information, see the locale module.
  • o is for octal, base 8.
  • s is for string.
  • X and x are for hexadecimal, base 16. The digits include uppercase A-F and lowercase a-f, depending on which X or x format character is used.
  • % is for percentage. The number is multiplied by 100 and includes the %.

We have a number of prefixes we can use for these different types. The most common one is the length. We might use {name:5d} to put in a 5-digit number. There are several prefixes for the preceding types:

  • Fill and alignment: We can specify a specific filler character (space is the default) and an alignment. Numbers are generally aligned to the right and strings to the left. We can change that using <, >, or ^. This forces left alignment, right alignment, or centering, respectively. There's a peculiar = alignment that's used to put padding after a leading sign.
  • Sign: The default rule is a leading negative sign where needed. We can use + to put a sign on all numbers, - to put a sign only on negative numbers, and a space to use a space instead of a plus for positive numbers. In scientific output, we often use {value: 5.3f}. The space makes sure that room is left for the sign, ensuring that all the decimal points line up nicely.
  • Alternate form: We can use the # to get an alternate form. We might have something like {0:#x}, {0:#o}, or {0:#b} to get a prefix on hexadecimal, octal, or binary values. With a prefix, the numbers will look like 0xnnn, 0onnn, or 0bnnn. The default is to omit the two-character prefix.
  • Leading zero: We can include 0 to get leading zeros to fill in the front of a number. Something like {code:08x} will produce a hexadecimal value with leading zeroes to pad it out to eight characters.
  • Width and precision: For integer values and strings, we only provide the width. For floating-point values, we often provide width.precision.

There are some times when we won't use a {name:format} specification. Sometimes, we'll need to use a {name!conversion} specification. There are only three conversions available:

  • {name!r} shows the representation that would be produced by repr(name).
  • {name!s} shows the string value that would be produced by str(name); this is the default behavior if you don't specify any conversion. Using !s explicitly lets you add string-type format specifiers.
  • {name!a} shows the ASCII value that would be produced by ascii(name).
  • Additionally, there's a handy debugging format specifier available in Python 3.8. We can include a trailing equals sign, =, to get a handy dump of a variable or expression. The following example uses both forms:
    >>> value = 2**12-1
    >>> f'{value=} {2**7+1=}'
    'value=4095 2**7+1=129'
    

The f-string showed the value of the variable named value and the result of an expression, 2**7+1.

In Chapter 7, Basics of Classes and Objects, we'll leverage the idea of the {name!r} format specification to simplify displaying information about related objects.

There's more...

The f-string processing relies on the string format() method. We can leverage this method and the related format_map() method for cases where we have more complex data structures.

Looking forward to Chapter 4, Built-In Data Structures Part 1: Lists and Sets, we might have a dictionary where the keys are simple strings that fit with the format_map() rules:

>>> data = dict(
... id=id, location=location, max_temp=max_temp,
... min_temp=min_temp, precipitation=precipitation
... )
>>> '{id:3s}  : {location:19s} :  {max_temp:3d} / {min_temp:3d} / {precipitation:5.2f}'.format_map(data)
'IAD  : Dulles Intl Airport :   32 /  13 /  0.40'

We've created a dictionary object, data, that contains a number of values with keys that are valid Python identifiers: id, location, max_temp, min_temp, and precipitation. We can then use this dictionary with format_map() to extract values from the dictionary using the keys.

Note that the formatting template here is not an f-string. It doesn't have the f" prefix. Instead of using the automatic formatting features of an f-string, we've done the interpolation "the hard way" using the format_map() method.

See also