`
zkrym
  • 浏览: 1329 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
最近访客 更多访客>>
社区版块
存档分类
最新评论

對List集合進行中英混合排序

阅读更多
示例:
假设有这样的十条记录:
Idrecord
1A ''Healthy Schools'' program in Hong Kong: Enhancing positive health behavior for school children and teachers
21996 opinion survey on civic education: Final report
3'A Luxury for the First World': A western perception of Hong Kong Chinese attitudes towards inclusive education
4A Chinese cultural critique of the global qualifying standards for social work education
5A baseline survey of students' attitudes towards gender stereotypes and family roles
6「中國文化」項目的教學與教學上的銜接
7中國語文及文化科的組織淺探
8「中學美術與設計科應用電子科技教學」試驗計劃
9點滴校園:學校社會工作文集
10齊來說故事:透過小組學習方式改善說話的表達能力及態度

通常意义上,我们想通过排序获得的排序结果如下:
IDrecord
11996 opinion survey on civic education: Final report
2A baseline survey of students' attitudes towards gender stereotypes and family roles
3A Chinese cultural critique of the global qualifying standards for social work education
4A ''Healthy Schools'' program in Hong Kong: Enhancing positive health behavior for school children and teachers
5'A Luxury for the First World': A western perception of Hong Kong Chinese attitudes towards inclusive education
6點滴校園:學校社會工作文集
7齊來說故事:透過小組學習方式改善說話的表達能力及態度
8「中國文化」項目的教學與教學上的銜接
9中國語文及文化科的組織淺探
10「中學美術與設計科應用電子科技教學」試驗計劃

由排序结果可知,我们将数字排在最前面,然后按字母排序,再按中文拼音排序。这个排序通常是没有问题的,但由数据我们会发现几个问题:字母大小写、符点符号(特殊符号)、中英混合排序等特殊性的地方。对此,如果数据量不是成万上十万条的情况下,我们可以采取如下步骤进行处理:
1、 定义一个对象
Public class RecordInfo{
private String id;
private String record;
private String recordtemp;
//get、set方法省略……
}
2、 例如这些记录是在数据库中的,我们将之取出,以RecordInfo对象的方式存储在某一个List集合中,在存入list过程中,为了处理特殊符号,我们可以将处理过后的值set到recordtemp字段中(可以用rePlaceAll()的方法处理特殊符号)。而且加入如下代码:
Collections.sort(listtemp, new AuthorVOCompare());
这里有一个AuthorVOCompare类,这个类主要用于重排序,代码如下:
import java.util.Comparator;
import net.sourceforge.pinyin4j.PinyinHelper;
import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;
import net.sourceforge.pinyin4j.format.exception.*;

@SuppressWarnings("unchecked")
public class AuthorVOCompare implements Comparator {
	public int compare(Object op1, Object op2) {
		RecordInfo record1 = (RecordInfo) op1;
		String o1 = record1.getRecordtemp();

		RecordInfo record2= (RecordInfo) op2;
		String o2 = record2. getRecordtemp ();

		for (int i = 0; i < o1.length() && i < o2.length(); i++) {

			int codePoint1 = o1.charAt(i);

			int codePoint2 = o2.charAt(i);

			if (Character.isSupplementaryCodePoint(codePoint1)
					|| Character.isSupplementaryCodePoint(codePoint2)) {
				i++;
			}

			if (codePoint1 != codePoint2) {
				if (Character.isSupplementaryCodePoint(codePoint1)
						|| Character.isSupplementaryCodePoint(codePoint2))

				{
					return codePoint1 - codePoint2;
				}

				String pinyin1 = pinyin((char) codePoint1);
				String pinyin2 = pinyin((char) codePoint2);

				if (pinyin1 != null && pinyin2 != null) {
					// 两个字符都是汉字
					if (!pinyin1.equals(pinyin2)) {
//這一條尤為重要,如果調用的是compareTo是不忽略大小寫的
						return pinyin1. compareToIgnoreCase (pinyin2);
					}
				} else {
					return codePoint1 - codePoint2;
				}
			}
		}

		return o1.length() - o2.length();
	}

	/**对中英文排序**/
	private String pinyin(char c) {

		if (String.valueOf(c) == null || String.valueOf(c).length() == 0) {
			return "";
		}

		HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
		format.setCaseType(HanyuPinyinCaseType.LOWERCASE);
		format.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
		format.setVCharType(HanyuPinyinVCharType.WITH_V);
		String output = "";
		try {
			if (java.lang.Character.toString(c).matches("[\\u4E00-\\u9FA5]+")) {
				String[] temp = PinyinHelper
						.toHanyuPinyinStringArray(c, format);
				if (temp != null && temp.length > 0) {
					output += temp[0];
				}
			} else {
				output += java.lang.Character.toString(c);
			}
		} catch (BadHanyuPinyinOutputFormatCombination e) {
			e.printStackTrace();
		}

		return output;
	}

}


这个类中需要引入一个jar包:pinyin4j-2.5.0.jar,主要用于拼音的排序,可以网上下载。


ps:本文無版權,代碼部分來摘自網絡。
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics