Hey Nancy!hunch
Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ
These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у
These 4 all contain a trailing 0x88 byte:
È ψ Ј ш
These 4 all contain a trailing 0x8d byte:
These 4 all contain a trailing 0x8f byte:
Ï ď Џ я
These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ
These 4 all contain a trailing 0x98 byte:
Ø Ř И ј
Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ
I believe they cover all the holes in CP1250, CP1251 and CP1252. If my
was correct then the charaters with trailing 0x81, 0x8d, 0x8f, 0x90 and 0x9d should end up with their trailing bytes stripped and therefore no longer be valid utf8 characters. If only the ones with 0x8d get their trailing bytes stripped then it is likely mark's hunch about the end of line is most likely correct. If none of them get stripped of their trailing bytes then?!?!?!?!?!
Life is good,
Maurice
... Cybertoasts of note:
2020-01-01 is 248 days from now and falls on a Wednesday.
2024-11-05 is 2018 days from now and falls on a Tuesday.
These 4 all contain a trailing 0x8d byte:
Quoting Maurice Kinal to Nancy Backus on 28-Apr-2019 06:25 <=-
Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ
These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у
These 4 all contain a trailing 0x88 byte:
È ψ Ј ш
These 4 all contain a trailing 0x8d byte:
These 4 all contain a trailing 0x8f byte:
Ï ď Џ я
These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ
These 4 all contain a trailing 0x98 byte:
Ø Ř И ј
Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ
I believe they cover all the holes in CP1250, CP1251 and CP1252. If
my hunch was correct then the charaters with trailing 0x81, 0x8d,
0x8f, 0x90 and 0x9d should end up with their trailing bytes stripped
and therefore no longer be valid utf8 characters. If only the ones
with 0x8d get their trailing bytes stripped then it is likely mark's
hunch about the end of line is most likely correct. If none of them
get stripped of their trailing bytes then ?!?!?!?!?!
[the above set came on four separate lines for at least one bbs]
That particular bbs would be substituting 0x0d for 0x8d while the rest are stripping the 0x8d
FWIW1: there are two places where 0x8d may be acted on...
1. the tosser may strip by ignoring completely and skipping
2. the BBS may strip or convert to 0x0d while displaying themessage or when packaging it for offline mail
FWIW1: there are two places where 0x8d may be acted on...
1. the tosser may strip by ignoring completely and skipping
I am not sure what you mean by that
but if the end result is what Nancy shows in her quotes of the
trailing 0x8d in the 4 utf8 characters then that would be the "may
strip" result which leaves only the leading byte ... or what I prefer
it be called the masterbyte. :::evil grin:::
2. the BBS may strip or convert to 0x0d while displaying the
message or when packaging it for offline mail
Which is what happened on at least one BBS according to Nancy, which matches with what Ozz's quote of the same message showed. Either way that does not bode well since 0x8d is a well used trailing byte in utf8.
Also worth mentioning is that two of the supported codepages, IBM848
and IBM866, use 0x8d as the exact same character - "CYRILLIC CAPITAL LETTER EN" - while IBM850 uses 0x8d as the same character as IBM437 - "LATIN SMALL LETTER I WITH GRAVE".
An interesting aside; U+040D known as "CYRILLIC CAPITAL LETTER I WITH GRAVE" will also work in the phrase, "It is all fun and games untilsomeone
loses an .", since the trailing byte happens to be 0x8d. Although in this case the masterbyte :::snicker::: 0xd0 will survive but then it no longer is a valid utf8 character without it's needed trailing byte.
1c. ignore it and do not write it to the output (aka skip)
a lot of code does #1 when it should do #2...
that in nancy's case, she sees the characters after
keep on with the poking... it may result in some real good for
the network one day
1c. ignore it and do not write it to the output (aka skip)
That sounds like what I called stripping in the previous post. Basically it has the same effect if it isn't in the output.
therea lot of code does #1 when it should do #2...
I am not convinced it "should" do either but perhaps in certain cases
could be characters that might do "harm" such as some ansi bbses do to user's terminals. I used to have to run reset after logging out of bbses to get things back to normal after telnetting to them.
This is what led me to where I am today as far as offline messaging is[...]
Bottomline is there is way too much that goes awry when dealing with differing codepages and stripping out codes will definetly cause harm
to messages.
that in nancy's case, she sees the characters after
Yes but Nancy's editor is perfect for testing since we both know for a fact it does no harm, even to utf8 characters which she cannot
'properly' render but she can see the 8 bit hex codes as they map out
to IBM437. If anything is amiss it is obvious when she quotes
whatever is of concern back as is in the case of the "loses an i" bug.
keep on with the poking... it may result in some real good for the
network one day
You too. Your call on this particular issue was bang on. I now bow
to the master.
that may be dependent on which BBS she is using for her replies
at that time
that may be dependent on which BBS she is using for her replies at
that time
Yes but both her and I determined ages ago that anything funky in our messages was due to the BBS and not her editor.
Her editor (uEmacs) is perfect so we both know for a fact that if
there is something oddball happening with messaging that it is likely
the bbs at fault or something in between. She knows what I am talking about ... even when she doesn't. :-)
Anyhow she helped confirm the "loses an i" bug, both the lossy and the switcheroo versions.
i know you know these things... it is my own infliction making
me cover the details...
like back in the w95 and KA9Q days ;)
Quoting mark lewis to Maurice Kinal on 01-May-2019 18:21 <=-
Yes but Nancy's editor is perfect for testing since we both know for a
fact it does no harm, even to utf8 characters which she cannot
'properly' render but she can see the 8 bit hex codes as they map out
to IBM437. If anything is amiss it is obvious when she quotes
whatever is of concern back as is in the case of the "loses an i" bug.
true but that may be dependent on which BBS she is using for her
replies at that time... i don't know if she is using my offline capabilities here or not...
i do have another user that cycles uploading messages between three
or four BBSes...
plus there's whatever path the messages may take from where they
upload their messages... this path used to be reliable and easily
used to track down problematic systems but with "fidoweb" and going "against the grain" of fidonet and not sending duplicates, there
are numerous paths that may be taken...
Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ
These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у
These 4 all contain a trailing 0x88 byte:
È ψ Ј ш
These 4 all contain a trailing 0x8d byte:
Ï ď Џ я
These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ
These 4 all contain a trailing 0x98 byte:
Ø Ř И ј
Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ
Quoting Maurice Kinal to Nancy Backus on 28-Apr-2019 06:25 <=-
Here you go.... :)
Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ
These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у
These 4 all contain a trailing 0x88 byte:
È ψ Ј ш
These 4 all contain a trailing 0x8d byte:
These 4 all contain a trailing 0x8f byte:
Ï ď Џ я
These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ
These 4 all contain a trailing 0x98 byte:
Ø Ř И ј
Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ
Quoting Ozz Nixon to Nancy Backus on 24-May-2019 19:37 <=-
Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ
These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у
These 4 all contain a trailing 0x88 byte:
È ψ Ј ш
These 4 all contain a trailing 0x8d byte:
Ï ď Џ я
These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ
These 4 all contain a trailing 0x98 byte:
Ø Ř И ј
Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ
Sorry for the dela on this thread - still int he process of moving to Florida. Anyway, what I see in Unison NNTP Client are the UTF8 A I D N
or A I D O characters as it should have been. However, in PCBoard 16,
I see the CP437 8bit character plus the character your trailing each
line with. My PCBoard terminal (fTelnet) would render the UTF8,
however, the header did not contain ^aCHRS: UTF8 so it assumes to stay
in the current state (CP437 as detected during the ANSI detection routine).
So, some environments render the UTF8 w/o the required CHRS signature.
Ozz
-!- ExchangeBBS FTN Tosser/JAM v1.19.04 (Beta 4.09)
! Origin: (1:1/123)
It was Maurice that was playing with this
Sysop: | Ruben Figueroa |
---|---|
Location: | Mesquite, Tx |
Users: | 3 |
Nodes: | 4 (0 / 4) |
Uptime: | 207:42:36 |
Calls: | 78 |
Files: | 53 |
Messages: | 71,708 |