Bump charset-normalizer from 2.1.1 to 3.0.1
Bumps charset-normalizer from 2.1.1 to 3.0.1.
Release notes
Sourced from charset-normalizer's releases.
Version 3.0.1
3.0.1 (2022-11-18)
Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
Changed
- Speedup provided using mypy/c 0.990 on Python >= 3.7
Version 3.0.0
3.0.0 (2022-10-20)
Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence rationormalizer --version
now specify if the current version provides extra speedup (meaning mypyc compilation whl)Changed
- Build with static metadata (not pyproject.toml yet)
- Make language detection stricter
- Optional: Module
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it
- Sphinx warnings when generating the documentation
Removed
- Coherence detector no longer returns 'Simple English' instead returns 'English'
- Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'
- Breaking: Method
first()
andbest()
from CharsetMatch- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflicts with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function
normalize
- Breaking: Properties
chaos_secondary_pass
,coherence_non_latin
andw_counter
from CharsetMatch- Support for the backport
unicodedata2
This is the last version (3.0.x) to support Python 3.6 We plan to drop it for 3.1.x
Version 3.0.0rc1
This is the last pre-release. If everything goes well, I will publish the stable tag.
3.0.0rc1 (2022-10-18)
Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence ratio
... (truncated)
Changelog
Sourced from charset-normalizer's changelog.
3.0.1 (2022-11-18)
Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
Changed
- Speedup provided by mypy/c 0.990 on Python >= 3.7
3.0.0 (2022-10-20)
Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence rationormalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter
- Optional: Module
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- Sphinx warnings when generating the documentation
Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- Breaking: Method
first()
andbest()
from CharsetMatch- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function
normalize
- Breaking: Properties
chaos_secondary_pass
,coherence_non_latin
andw_counter
from CharsetMatch- Support for the backport
unicodedata2
3.0.0rc1 (2022-10-18)
Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence ratioChanged
- Build with static metadata using 'build' frontend
- Make the language detection stricter
Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
... (truncated)
Upgrade guide
Sourced from charset-normalizer's upgrade guide.
Guide to upgrade your code from v1 to v2
- If you are using the legacy
detect
function, that is it. You have nothing to do.Detection
Before
from charset_normalizer import CharsetNormalizerMatches results = CharsetNormalizerMatches.from_bytes( '我没有埋怨,磋砣的只是一些时间。'.encode('utf_32') )
After
from charset_normalizer import from_bytes results = from_bytes( '我没有埋怨,磋砣的只是一些时间。'.encode('utf_32') )
Methods that once were staticmethods of the class
CharsetNormalizerMatches
are now basic functions.from_fp
,from_bytes
,from_fp
and `` are concerned.Staticmethods scheduled to be removed in version 3.0
Commits
-
5dd7aa0
Release 3.0.1 (#238) -
b68f8d8
⬆ ️ Bump wheel from 0.37.1 to 0.38.4 (#234) -
27605b8
Update run-tests.yml (#236) -
fecbd67
⬆ Bump mypy from 0.982 to 0.990 (#235) -
acb658a
Update README.md -
2d26aeb
Improve multi-byte cutter/chunk (#233) -
5ec4a27
Create codeql.yml (#230) -
7c64266
Update CONTRIBUTING.md -
1f30cb8
⬆ Bump build from 0.8.0 to 0.9.0 (#229) -
59f4c0e
Create FUNDING.yml - Additional commits viewable in compare view
Dependabot commands
You can trigger Dependabot actions by commenting on this MR
-
$dependabot rebase
will rebase this MR -
$dependabot recreate
will recreate this MR rewriting all the manual changes and resolving conflicts