HEX
Server: LiteSpeed
System: Linux us-phx-web1284.main-hosting.eu 4.18.0-553.109.1.lve.el8.x86_64 #1 SMP Thu Mar 5 20:23:46 UTC 2026 x86_64
User: u300739242 (300739242)
PHP: 8.2.30
Disabled: system, shell_exec, passthru, mysql_list_dbs, ini_alter, dl, symlink, link, chgrp, leak, popen, apache_child_terminate, virtual, mb_send_mail
Upload Files
File: //opt/alt/python311/lib/python3.11/site-packages/chardet/__pycache__/charsetprober.cpython-311.pyc
�

��;f���>�ddlZddlZddlmZGd�de��ZdS)�N�)�ProbingStatec��eZdZdZdd�Zd�Zed���Zd�Zed���Z	d�Z
ed	���Zed
���Z
ed���ZdS)
�
CharSetProbergffffff�?Nc�^�d|_||_tjt��|_dS�N)�_state�lang_filter�logging�	getLogger�__name__�logger)�selfr
s  �H/opt/alt/python311/lib/python3.11/site-packages/chardet/charsetprober.py�__init__zCharSetProber.__init__'s'�����&����'��1�1�����c�(�tj|_dSr)r�	DETECTINGr	�rs r�resetzCharSetProber.reset,s��"�,����rc��dSr�rs r�charset_namezCharSetProber.charset_name/s���trc��dSrr)r�bufs  r�feedzCharSetProber.feed3s���rc��|jSr)r	rs r�statezCharSetProber.state6s
���{�rc��dS)Ngrrs r�get_confidencezCharSetProber.get_confidence:s���src�2�tjdd|��}|S)Ns([-])+� )�re�sub)rs r�filter_high_byte_onlyz#CharSetProber.filter_high_byte_only=s���f�&��c�2�2���
rc��t��}tjd|��}|D]Z}|�|dd���|dd�}|���s|dkrd}|�|���[|S)u9
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [€-ÿ]
        marker: everything else [^a-zA-Z€-ÿ]

        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.

        This filter applies to all scripts which do not use English characters.
        s%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?N�����r")�	bytearrayr#�findall�extend�isalpha)r�filtered�words�word�	last_chars     r�filter_international_wordsz(CharSetProber.filter_international_wordsBs����;�;��
�
�O�� � ���
	'�
	'�D��O�O�D��"��I�&�&�&��R�S�S�	�I��$�$�&�&�
!�9�w�+>�+>� �	��O�O�I�&�&�&�&��rc��t��}d}d}tt|����D]y}|||dz�}|dkrd}n|dkrd}|dkrS|���s?||kr4|s2|�|||���|�d��|dz}�z|s|�||d	���|S)
a�
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        Also retains English alphabet and high byte characters immediately
        before occurrences of >.

        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Frr�>�<Tr(r"N)r)�range�lenr,r+)rr-�in_tag�prev�curr�buf_chars      r�filter_with_english_lettersz)CharSetProber.filter_with_english_lettersgs����;�;�������#�c�(�(�O�O�	 �	 �D��4��q��=�)�H��4�������T�!�!����'�!�!�(�*:�*:�*<�*<�!��$�;�;�v�;��O�O�C��T�	�N�3�3�3��O�O�D�)�)�)��a�x����	(�
�O�O�C����J�'�'�'��rr)r
�
__module__�__qualname__�SHORTCUT_THRESHOLDrr�propertyrrrr �staticmethodr%r1r;rrrrr#s���������2�2�2�2�
-�-�-�����X��
�
�
�����X���������\���"�"��\�"�H�)�)��\�)�)�)rr)rr#�enumsr�objectrrrr�<module>rCsi��:����	�	�	�	�������n�n�n�n�n�F�n�n�n�n�nr