HTML Encoding With PHP
HTML is the markup language of the web, and when it comes down to it, it is the HTML that makes the web tick. All of these fancy programming languages are fun to use in order to facilitate dynamic pages and so on, but without HTML, we wouldn’t have too much now would we? One of the most common things we do with PHP, or any other web based programming language for that matter, is to generate HTML on the fly using logic and conditions. As we know, HTML makes use of angle brackets and other characters to provide a means to the browser to render a web page. HTML has some special reserved characters that you need to watch out for since they have a specific meaning to HTML.< and >
Here they are, the < and > characters. These two characters are what surround the html tag names in the page. This is what instructs the browser that hey, something really interesting is happening here and you need to interpret the information between these tags as such. These angled brackets also denote that the data within them is not to be outputted to the page in human readable form. Though the web browser sees something like <b>this bold text</b>, the user should see this bold text. Two different encoding styles with two different meanings.HTML Reserved Characters
There are four main characters that are reserved in HTML which we need to pay close attention to. Here is a table that outlines them all.
Character |
Encoding |
< |
< |
> |
> |
& |
& |
" |
" |
htmlspecialchars($string)
First up we’ll examine the use of htmlspecialchars. It might make sense to observe a broken scenario, and then we’ll look at the solution using htmlspecialchars. Suppose we want to include a link with specific anchor text. The anchor text we’d like to display is <Click Here> & Prosper! So you figure, ok easy enough, we can just place this text we like in between anchor tags and create our link. Let’s try it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
<html>
<head>
<meta charset="utf-8">
<title>HTML Encode Like a PRO</title>
<link href="css/bootstrap.min.css" rel="stylesheet">
<script src="js/respond.js"></script>
<script src="http://code.jquery.com/jquery-latest.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</head>
<body>
<a href="http://localhost/bootstrapsandbox/encode.php">
<Click Here> & Prosper!
</a>
</body>
</html>
|
& Prosper!
Do you see what that is right there? That right there, is a bit fat
fail. The page didn’t display the text we wanted at all. The reason
for this is because the browser comes along and sees those angled
brackets and thinks that it is dealing with an HTML tag. In this case
however, it is not HTML at all. It is the actual angled brackets that
we want the user to see in the text of the link. It is times like this
that htmlspecialchars comes to the rescue! Observe!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
<html>
<head>
<meta charset="utf-8">
<title>HTML Encode Like a PRO</title>
<link href="css/bootstrap.min.css" rel="stylesheet">
<script src="js/respond.js"></script>
<script src="http://code.jquery.com/jquery-latest.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</head>
<body>
<a href="http://localhost/bootstrapsandbox/encode.php">
<?php echo htmlspecialchars('<Click Here> & Prosper!'); ?>
</a>
</body>
</html>
|
<Click Here> & Prosper!
Nice! Now
that link text is working as designed. Think of the htmlspecialchars as
a method to disable HTML so to speak. It disables the HTML to the
browser and allows the user to see what the browser normally sees. A
key point of note is that htmlspecialchars only handles the four
reserved characters listed in the table above. Now you may know that
there are a rather large collection of symbols that we might want to
display in the text of our HTML which the browser will not know how to
render. Things like Trademark symbols, Copyright Symbols, At signs, and
many more. In this case, you need to bust out the big dog, the
htmlentities function.htmlentities($string)
The htmlentities function covers all characters that have an equivalent html entity representation in the language. Therefore, htmlentities is much more powerful. To illustrate this, we’ll try to enter in some of the more common special characters that you might want to have in your webpage. Let’s try it out.
1
2
3
4
|
<?php
$text = '© ® ™ £ € ¥';
echo $text;
?>
|
1
2
3
4
|
<?php
$text = '© ® ™ £ € ¥';
echo htmlentities($text);
?>
|
© ® ™ £ € ¥
URL Encoding Meets HTML Encoding
It’s time to level up friends. Now that we have a good grasp of URL encoding from our last episode, as well as the fundamentals of html encoding via this action packed tutorial, we can put the whole picture together to see how this works. This will sum up our learning of URL as well as HTML encoding.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
<html>
<head>
<meta charset="utf-8">
<title>HTML Encode Like a PRO</title>
<link href="css/bootstrap.min.css" rel="stylesheet">
<script src="js/respond.js"></script>
<script src="http://code.jquery.com/jquery-latest.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</head>
<body>
<?php
$page = 'bootstrapsandbox/encode.php';
$variable1 = 'Look out now, < > " and & which are bad!';
$variable2 = 'More bad chars like &#?*$[]+ and so on!';
$anchortext = '<Click Here> & Prosper!';
$url = 'http://localhost/';
// rawurlencode anything to the left of the ? !!
$url .= rawurlencode($page);
// urlencode anything to the right of the ? !!
$url .= '?' . 'variable1='. urlencode($variable1);
$url .= '&' . 'variable2='. urlencode($variable2);
// at this point $url is safe to put into the query string
// it might NOT however, be safe to output into our HTML!
// Just becuase a string is now fully URL encoded does not mean
// it is safe for output into page HTML. This is why the $url
// parameter must also be run through htmlspecialchars.
?>
<a href="<?php echo htmlspecialchars($url); // ?>">
<?php echo htmlspecialchars($anchortext); ?>
</a>
</body>
</html>
|
1
|
http://localhost/bootstrapsandbox%2Fencode.php?variable1=Look+out+now%2C+%3C+%3E+%22+and+%26+which+are+bad%21&variable2=More+bad+chars+like+%26%23%3F%2A%24%5B%5D%2B+and+so+on%21
|
Handy HTML Entity Table
If you ever need a good reference for all of the HTML entities you can use, here is a list of them. This list is good for being aware of characters that should be run through the htmlentities function as well.Various ASCII Character entities. | |
Symbol | Encoding |
À | À |
Á | Á |
 |  |
à | à |
Ä | Ä |
Å | Å |
Æ | Æ |
Ç | Ç |
È | È |
É | É |
Ê | Ê |
Ë | Ë |
Ì | Ì |
Í | Í |
Î | Î |
Ï | Ï |
Ð | Ð |
Ñ | Ñ |
Ò | Ò |
Ó | Ó |
Ô | Ô |
Õ | Õ |
Ö | Ö |
Ø | Ø |
Ù | Ù |
Ú | Ú |
Û | Û |
Ü | Ü |
Ý | Ý |
Þ | Þ |
ß | ß |
à | à |
á | á |
â | â |
ã | ã |
ä | ä |
å | å |
æ | æ |
ç | ç |
è | è |
é | é |
ê | ê |
ë | ë |
ì | ì |
í | í |
î | î |
ï | ï |
ð | ð |
ñ | ñ |
ò | ò |
ó | ó |
ô | ô |
õ | õ |
ö | ö |
ø | ø |
ù | ù |
ú | ú |
û | û |
ü | ü |
ý | ý |
þ | þ |
ÿ | ÿ |
Various ISO-8859-1 HTML entities. | |
Symbol | Encoding |
| |
¡ | ¡ |
¢ | ¢ |
£ | £ |
¤ | ¤ |
¥ | ¥ |
¦ | ¦ |
§ | § |
¨ | ¨ |
© | © |
ª | ª |
« | « |
¬ | ¬ |
| ­ |
® | ® |
¯ | ¯ |
° | ° |
± | ± |
² | ² |
³ | ³ |
´ | ´ |
µ | µ |
¶ | ¶ |
¸ | ¸ |
¹ | ¹ |
º | º |
» | » |
¼ | ¼ |
½ | ½ |
¾ | ¾ |
¿ | ¿ |
× | × |
÷ | ÷ |
HTML entities for Math Symbols. | |
Symbol | Encoding |
∀ | ∀ |
∂ | ∂ |
∃ | ∃ |
∅ | ∅ |
∇ | ∇ |
∈ | ∈ |
∉ | ∉ |
∋ | ∋ |
∏ | ∏ |
∑ | ∑ |
− | − |
∗ | ∗ |
√ | √ |
∝ | ∝ |
∞ | ∞ |
∠ | ∠ |
∧ | ∧ |
∨ | ∨ |
∩ | ∩ |
∪ | ∪ |
∫ | ∫ |
∴ | ∴ |
∼ | ∼ |
≅ | ≅ |
≈ | ≈ |
≠ | ≠ |
≡ | ≡ |
≤ | ≤ |
≥ | ≥ |
⊂ | ⊂ |
⊃ | ⊃ |
⊄ | ⊄ |
⊆ | ⊆ |
⊇ | ⊇ |
⊕ | ⊕ |
⊗ | ⊗ |
⊥ | ⊥ |
⋅ | ⋅ |
Greek Letters and their HTML Entities. | |
Symbol | Encoding |
Α | Α |
Β | Β |
Γ | Γ |
Δ | Δ |
Ε | Ε |
Ζ | Ζ |
Η | Η |
Θ | Θ |
Ι | Ι |
Κ | Κ |
Λ | Λ |
Μ | Μ |
Ν | Ν |
Ξ | Ξ |
Ο | Ο |
Π | Π |
Ρ | Ρ |
Σ | Σ |
Τ | Τ |
Υ | Υ |
Φ | Φ |
Χ | Χ |
Ψ | Ψ |
Ω | Ω |
α | α |
β | β |
γ | γ |
δ | δ |
ε | ε |
ζ | ζ |
η | η |
θ | θ |
ι | ι |
κ | κ |
λ | λ |
μ | μ |
ν | ν |
ξ | ξ |
ο | ο |
π | π |
ρ | ρ |
ς | ς |
σ | σ |
σ | σ |
τ | τ |
υ | υ |
φ | φ |
χ | χ |
ψ | ψ |
ω | ω |
ϑ | ϑ |
ϒ | ϒ |
ϖ | ϖ |
Other Various HTML entities. | |
Symbol | Encoding |
Œ | Œ |
œ | œ |
Š | Š |
š | š |
Ÿ | Ÿ |
ƒ | ƒ |
ˆ | ˆ |
˜ | ˜ |
– | – |
— | — |
‘ | ‘ |
’ | ’ |
‚ | ‚ |
“ | “ |
” | ” |
„ | „ |
† | † |
‡ | ‡ |
• | • |
… | … |
‰ | ‰ |
′ | ′ |
″ | ″ |
‹ | ‹ |
› | › |
‾ | ‾ |
€ | € |
™ | ™ |
← | ← |
↑ | ↑ |
→ | → |
↓ | ↓ |
↔ | ↔ |
↵ | ↵ |
⌈ | ⌈ |
⌉ | ⌉ |
⌊ | ⌊ |
⌋ | ⌋ |
◊ | ◊ |
♠ | ♠ |
♣ | ♣ |
♥ | ♥ |
♦ | ♦ |
0 comments:
Post a Comment