Diskuse JPW: PHP Simple HTML DOM Parser kódování

	Autor	Zpráva
	Malex Profil	#1 · Zasláno: 27. 9. 2014, 21:23:42 Odpovědět Citovat Zdravím, snažím se propojit ligovou tabulku mezi dvěma weby, pomocí PHP Simple HTML DOM Parser. Mám kód include('simple_html_dom.php'); $url = file_get_html('http://www.sipky.org/cs/uso/s/liga/?action=home&league_id=93448'); $rozpis = array(); foreach($url->find('div.table') as $cast){ $rozpis[]= $cast;} print($rozpis[0]); $url->clear(); unset($url); Na vstupu je tabulka viz foto (kódování stránek windows-1250) Na výstupu ta samá (kódování stránek utf-8), jen u odkazů je rozházené kódování, pokud použiji `$cast = iconv("utf-8","windows-1250", $cast);` tak odkazy( týmy) jsou v pořádku, ale u ostatního se rozhází kódování. Děkuji za rady.
	juriad Profil	#2 · Zasláno: 28. 9. 2014, 07:43:44 Odpovědět Citovat Simple HTML DOM Parser asi omylem provádí nějaké změny kódování. Pro jeho správnou funkčnost se nejspíš předpokládá, že vše je v UTF-8. Nezbyde ti asi nic jiného než stránku napřed stáhnout, překódovat ji pomocí iconv a až potom předhodit str_get_html. $str = file_get_contents('...'); # nebo nějak jinak $str = iconv("windows-1250", "utf-8",$str); # windows-1250 -> utf-8 $url = str_get_html($str); Všimni si, že tobě jako „oprava“ „funguje“ opačný směr převodu z utf-8 na windows-1250.
	Malex Profil	#3 · Zasláno: 28. 9. 2014, 09:02:18 Odpovědět Citovat Díky, tvoje rada pomohla. P.S. Ta "oprava" když jsem jí dal ve směru windows-1250 -> utf-8, tak bylo vše rozházené. Takže hledat v tom DOMu. // PaperG - Function to convert the text from one character set to another if the two sets are not the same. function convert_text($text) { global $debug_object; if (is_object($debug_object)) {$debug_object->debug_log_entry(1);} $converted_text = $text; $sourceCharset = ""; $targetCharset = ""; if ($this->dom) { $sourceCharset = strtoupper($this->dom->_charset); $targetCharset = strtoupper($this->dom->_target_charset); } if (is_object($debug_object)) {$debug_object->debug_log(3, "source charset: " . $sourceCharset . " target charset: " . $targetCharset);} if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0)) { // Check if the reported encoding could have been incorrect and the text is actually already UTF-8 if ((strcasecmp($targetCharset, 'UTF-8') == 0) && ($this->is_utf8($text))) { $converted_text = $text; } else { $converted_text = iconv($sourceCharset, $targetCharset, $text); } } // Lets make sure that we don't have that silly BOM issue with any of the utf-8 text we output. if ($targetCharset == 'UTF-8') { if (substr($converted_text, 0, 3) == "\xef\xbb\xbf") { $converted_text = substr($converted_text, 3); } if (substr($converted_text, -3) == "\xef\xbb\xbf") { $converted_text = substr($converted_text, 0, -3); } } return $converted_text; } /** * Returns true if $string is valid UTF-8 and false otherwise. * * @param mixed $str String to be tested * @return boolean / static function is_utf8($str) { $c=0; $b=0; $bits=0; $len=strlen($str); for($i=0; $i<$len; $i++) { $c=ord($str[$i]); if($c > 128) { if(($c >= 254)) return false; elseif($c >= 252) $bits=6; elseif($c >= 248) $bits=5; elseif($c >= 240) $bits=4; elseif($c >= 224) $bits=3; elseif($c >= 192) $bits=2; else return false; if(($i+$bits) > $len) return false; while($bits > 1) { $i++; $b=ord($str[$i]); if($b < 128 \|\| $b > 191) return false; $bits--; } } } return true; } / function is_utf8($string) { //this is buggy return (utf8_encode(utf8_decode($string)) == $string); } */
		Časová prodleva: 1 rok
	jasom Profil	#4 · Zasláno: 23. 11. 2015, 18:05:41 Odpovědět Citovat juriad: Simple HTML DOM Parser asi omylem provádí nějaké změny kódování. Pro jeho správnou funkčnost se nejspíš předpokládá, že vše je v UTF-8. Nezbyde ti asi nic jiného než stránku napřed stáhnout, překódovat ji pomocí iconv a až potom předhodit str_get_html. Nádhera juriad, presne toto som potreboval!
		Časová prodleva: 10 let

Vaše odpověď

Mohlo by se hodit