解决java中对URL编码的问题

摘要:
首先,检查javascript encodeURI中encodeURI和encodeURLComponent方法之间的差异:不会对ASCII字母和数字进行编码,也不会对这些ASCII标点符号进行编码:-__!~*'()在java中,URLEncoder Encode方法:不会对ASCII字母和数字进行编码,也不会对这些ASCII标点符号进行编码:-_*参考代码如下:dontNeedEncoding=newBitSet;inti;对于{dontNeedEncoding.set;}对于{dontNeedEncoding.set;}对于{dontNeedEncoding.set;}dont需要编码。集合(“”);/*在代码()方法*/dontNeedEncoding中编码aspaceoa+isdone*。集合('-');dont需要编码。集合('_');dontNeedEncoding.set('.');dont需要编码。集合('*');如果我想用Java编码url,而不是用URI中具有特殊含义的ASCII标点符号,我需要在dontNeedEncoding中添加相关字符,并创建我自己的编码类MyURIEncode:packagecom.sitech.solr。效用;importjava.io。CharArrayWriter;importjava.io。不支持编码异常;importjava.nio.charset。字符集;importjava.nio.charset。非法字符集名称异常;importjava.nio.charset。UnsupportdCharsetException;importjava.security。访问控制器;导入java.util。比特集;导入安全操作。GetPropertyAction;publicclassMyURIEncoder{staticBitSetdontNeedEncoding;staticfinalintcaseDiff=;staticStringdfltEncName=null;static{/*未编码的字符列表已*确定如下:**RFC2396声明:*----*在URI中允许但不具有*保留目的的数据字符被称为保留字符。这些字符包括大写*和小写字母、小数位数以及*标点符号和符号的限制。**未保留=字母数字|标记**标记=“-”|“_”|“.”|“!

首先查看javascript中的encodeURI和encodeURLComponent方法的区别.

encodeURI:不会对 ASCII 字母和数字进行编码,也不会对这些 ASCII 标点符号进行编码: - _ . ! ~ * ' ( )    也不会对以下在 URI 中具有特殊含义的 ASCII 标点符                       号,encodeURI() 函数是不会进行转义的:;/?:@&=+$,#

encodeURLComponent:不会对 ASCII 字母和数字进行编码,也不会对这些 ASCII 标点符号进行编码: - _ . ! ~ * ' ( )

而java中,URLEncoder.encode(string content,String enc) 方法:

  不会对 ASCII 字母和数字进行编码,也不会对这些 ASCII 标点符号进行编码: - _ .  * 

参考代码如下:

        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = 'a'; i <= 'z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = 'A'; i <= 'Z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = '0'; i <= '9'; i++) {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(' '); /* encoding a space to a + is done
                                    * in the encode() method */
        dontNeedEncoding.set('-');
        dontNeedEncoding.set('_');
        dontNeedEncoding.set('.');
        dontNeedEncoding.set('*');

如果我想要在java中对一个url进行编码,但是不对URI 中具有特殊含义的 ASCII 标点符号进行编码,需要在dontNeedEncoding中添加相关字符,创建自己的编码类MyURIEncode:

  

package com.sitech.solr.util;

import java.io.CharArrayWriter;
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.IllegalCharsetNameException;
import java.nio.charset.UnsupportedCharsetException;
import java.security.AccessController;
import java.util.BitSet;
import sun.security.action.GetPropertyAction;
public class MyURIEncoder {
    static BitSet dontNeedEncoding;
    static final int caseDiff = ('a' - 'A');
    static String dfltEncName = null;

    static {

        /* The list of characters that are not encoded has been
         * determined as follows:
         *
         * RFC 2396 states:
         * -----
         * Data characters that are allowed in a URI but do not have a
         * reserved purpose are called unreserved.  These include upper
         * and lower case letters, decimal digits, and a limited set of
         * punctuation marks and symbols.
         *
         * unreserved  = alphanum | mark
         *
         * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
         *
         * Unreserved characters can be escaped without changing the
         * semantics of the URI, but this should not be done unless the
         * URI is being used in a context that does not allow the
         * unescaped character to appear.
         * -----
         *
         * It appears that both Netscape and Internet Explorer escape
         * all special characters from this list with the exception
         * of "-", "_", ".", "*". While it is not clear why they are
         * escaping the other characters, perhaps it is safest to
         * assume that there might be contexts in which the others
         * are unsafe if not escaped. Therefore, we will use the same
         * list. It is also noteworthy that this is consistent with
         * O'Reilly's "HTML: The Definitive Guide" (page 164).
         *
         * As a last note, Intenet Explorer does not encode the "@"
         * character which is clearly not unreserved according to the
         * RFC. We are being consistent with the RFC in this matter,
         * as is Netscape.
         *
         */

        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = 'a'; i <= 'z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = 'A'; i <= 'Z'; i++) {
            dontNeedEncoding.set(i);
        }
        for (i = '0'; i <= '9'; i++) {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(' '); /* encoding a space to a + is done
                                    * in the encode() method */
        dontNeedEncoding.set('-');
        dontNeedEncoding.set('_');
        dontNeedEncoding.set('.');
        dontNeedEncoding.set('*');
        
        
        //对以下在 URI 中具有特殊含义的 ASCII 标点符号    ;/?:@&=+$,#  不需要转义
        dontNeedEncoding.set(';');
        dontNeedEncoding.set('/');
        dontNeedEncoding.set('?');
        dontNeedEncoding.set(':');
        dontNeedEncoding.set('@');
        dontNeedEncoding.set('&');
        dontNeedEncoding.set('=');
        dontNeedEncoding.set('+');
        dontNeedEncoding.set('$');
        dontNeedEncoding.set(',');
        dontNeedEncoding.set('#');
        

        dfltEncName = AccessController.doPrivileged(
            new GetPropertyAction("file.encoding")
        );
    }

    /**
     * You can't call the constructor.
     */
    private MyURIEncoder() { }

    public static String encode(String s, String enc)
        throws UnsupportedEncodingException {

        boolean needToChange = false;
        StringBuffer out = new StringBuffer(s.length());
        Charset charset;
        CharArrayWriter charArrayWriter = new CharArrayWriter();

        if (enc == null)
            throw new NullPointerException("charsetName");

        try {
            charset = Charset.forName(enc);
        } catch (IllegalCharsetNameException e) {
            throw new UnsupportedEncodingException(enc);
        } catch (UnsupportedCharsetException e) {
            throw new UnsupportedEncodingException(enc);
        }

        for (int i = 0; i < s.length();) {
            int c = (int) s.charAt(i);
            //System.out.println("Examining character: " + c);
            if (dontNeedEncoding.get(c)) {
                if (c == ' ') {
                    c = '+';
                    needToChange = true;
                }
                //System.out.println("Storing: " + c);
                out.append((char)c);
                i++;
            } else {
                // convert to external encoding before hex conversion
                do {
                    charArrayWriter.write(c);
                    /*
                     * If this character represents the start of a Unicode
                     * surrogate pair, then pass in two characters. It's not
                     * clear what should be done if a bytes reserved in the
                     * surrogate pairs range occurs outside of a legal
                     * surrogate pair. For now, just treat it as if it were
                     * any other character.
                     */
                    if (c >= 0xD800 && c <= 0xDBFF) {
                        /*
                          System.out.println(Integer.toHexString(c)
                          + " is high surrogate");
                        */
                        if ( (i+1) < s.length()) {
                            int d = (int) s.charAt(i+1);
                            /*
                              System.out.println("	Examining "
                              + Integer.toHexString(d));
                            */
                            if (d >= 0xDC00 && d <= 0xDFFF) {
                                /*
                                  System.out.println("	"
                                  + Integer.toHexString(d)
                                  + " is low surrogate");
                                */
                                charArrayWriter.write(d);
                                i++;
                            }
                        }
                    }
                    i++;
                } while (i < s.length() && !dontNeedEncoding.get((c = (int) s.charAt(i))));

                charArrayWriter.flush();
                String str = new String(charArrayWriter.toCharArray());
                byte[] ba = str.getBytes(charset);
                for (int j = 0; j < ba.length; j++) {
                    out.append('%');
                    char ch = Character.forDigit((ba[j] >> 4) & 0xF, 16);
                    // converting to use uppercase letter as part of
                    // the hex value if ch is a letter.
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                    ch = Character.forDigit(ba[j] & 0xF, 16);
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                }
                charArrayWriter.reset();
                needToChange = true;
            }
        }

        return (needToChange? out.toString() : s);
    }
}

免责声明:文章转载自《解决java中对URL编码的问题》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇【Hive学习之二】Hive SQLGhostScript命令参数详解(转)下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

如何解决扩展AscII乱码噩梦

 今天公司站点遇到个问题:因为数据库设置的编码是GB的,所以一些法语字符存进去后立马乱码,更别说显示了,所以我想到了西文字符的显示方式。 如果你安装了DW,可以将首选参数的默认编码设置为希腊文,新建一个HTML页面你可以看到charset=iso-8859-7,OK,将其改为charset=iso-8859-1即可。 在拆分的下面输入框中输入汉字,嘿嘿,看...

URL的编码和解码

URL的编码和解码   参考:阮一峰--关于URL编码 1 为什么要URL编码 在因特网上传送URL,只能采用ASCII字符集      也就是说URL只能使用英文字母、阿拉伯数字和某些标点符号,不能使用其他文字和符号,即只有字母和数字[0-9a-zA-Z]、一些特殊符号$-_.+!*'()[不包括双引号]、以及某些保留字(空格转换为+),才可以不经过编...

c# url自动解码解决方案

昨天下午易宝维护人员联系我,询问我三笔订单的状态,他们是扣费成功了,我们这还是订单待扣费状态,经过检查发现双方的签名有问题,仔细一看这三笔订单都有一个共性,用户名都包含中文,然后就对症下药。。。 由于易宝再扣费成功后通知我方前,对字符串用gb2312 urlencode了,而我方接口是用.net实现的,所以默认的编码格式是utf-8,所以接收到请求后就把字...

loadrunner乱码问题解决办法

  7、LoadRunner回放脚本时,在浏览器显示的中文是乱码 最近,遇到了好多乱码的问题,解决了一些,还有最后一个乱码,能想到的各种办法都试过了,还是不行,很奇怪啊。 解决这些乱码时,涉及到了http头数据,不是很了解。 第一个乱码: 操作返回的提示信息:操作成功、失败原因,这样的信息返回到页面乱码。最后找到的原因是:返回的ContentType格式...

Java 关于中文乱码处理的经验总结【转载】

为什么说乱码是中国程序员无法避免的话题呢?这个首先要从编码机制上说起,大家都是中文和英文的编码格式不是一样,解码也是不一样的!如果中国的程序员不会遇到乱码,那么只有使用汉语编程。汉语编程是怎么回事我也不大清楚,应该是前年吧,我一朋友给我介绍汉语编程,怎么不错不错?当时因为学习忙没去关注这个,等我闲了,那个朋友不弄这个,问他他也不说不大清楚,最后自己对这个学...

【转】一个URL编码和解码的C++类

下面的代码实现了一个用于C++中转码的类strCoding。里面有UTF8、UNICODE、GB2312编码的互相转换。 .H文件: #pragma once #include <iostream> #include <string> #include <windows.h> using namespace std;...