spyder-ide:master
← JulienPalard:mdk-typo
opened 09:08PM - 05 Oct 22 UTC
This is literally the smallest PR I've ever done.
It removes a zero width no-…break space.
But this char was breaking the inline literal next to it, see in [this page](https://docs.spyder-ide.org/current/videos/working-with-spyder.html#dropdown-1), the ```` ``is_dark_font_color`` ```` should have been interpreted by Sphinx and rendered in red:
![Capture d’écran du 2022-10-05 22-27-06](https://user-images.githubusercontent.com/239510/194156834-06a80fee-0ec8-4c84-9963-158500d29a58.png)
The removed character is obviously **not** rendered in github "files changed" interface. Not in `git diff`, and `git show --color-words` either. Not in your editor, and not in your terminal, ... The character is a space. And a space with no width!!!
If you really want to see it, a `git show | cat -A` can be helpfull, you'll see something like:
```diff
-in the ``mainwindow.py`` file we import the M-oM-;M-?``is_dark_font_color``
+in the ``mainwindow.py`` file we import the ``is_dark_font_color``
```
But the paragraph is **way** longer than that so it's a bit hard to spot.
For the curious the `M-...` notation denotes bytes in the range `[128;255]`. The 32 first of this range are then treated as if they were in the range `[0; 32]` and displayed using the `^` notation, so `\x80` is `M-^@`, and the other ones are just substracted by 128, so `\xa0` is `M- ` (yes a space).
So `M-o` is `\x6f + 128` (`\x6f` is the value for `o` in the ASCII table) = `\xef`. `M-;` is `\xbb` and `M-?` is `\xbf`. Gives us the sequence `\xef\xbb\xbf`.
Still curious? The file is encoded using UTF-8, so to decode this UTF-8 sequence we need to extract relevant bits from it. In binary it looks like:
11101111 10111011 10111111
The leading `1110` means "There's 3 bytes for this char" (Count the ones, three ones → three bytes. The zero is just a delimiter). The trailing two bytes starts with "10" meaning "we're trailing bytes".
If we drop those markers (`1110` and `10` in front of bytes) and keep the remaining bits we're left with `1111111011111111`, which evaluates to 65279, which is in hexadecimal`0xfeff`. Yes, you recognize it, it's a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark). Because yes a BOM is just a `ZERO WIDTH NO-BREAK SPACE`, isn't it beautiful?
Do we really have to do the bit manipulation to discover what this character was? Obviously not, just use emacs' `M-x describe char` on it:
```text
position: 4646 of 14699 (32%), column: 380
character: (displayed as ) (codepoint 65279, #o177377, #xfeff)
charset: unicode (Unicode (ISO10646))
code point in charset: 0xFEFF
script: arabic
syntax: w which means: word
to input: type "C-x 8 RET feff" or "C-x 8 RET ZERO WIDTH NO-BREAK SPACE"
buffer code: #xEF #xBB #xBF
file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix)
display: by this font (glyph code):
ftcrhb:-GOOG-Noto Naskh Arabic UI-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x5D5)
Character code properties: customize what to show
name: ZERO WIDTH NO-BREAK SPACE
old-name: BYTE ORDER MARK
general-category: Cf (Other, Format)
decomposition: (65279) ('')
```
And this is literally the longest PR description I've written.