单词 统计

摘要:
单词由分隔符分隔,不区分大小写。输出时,所有单词都用小写字符表示。英文字母:A-Z,A-Z字母数字符号:A-Z、A-Z、0-9步骤1:输出单个文件中出现频率最高的前N个英文单词。好,好和好是一个词。

用户需求:

英语的26 个字母的频率在一本小说中是如何分布的?

某类型文章中常出现的单词是什么?

某作家最常用的词汇是什么?

《哈利波特》 中最常用的短语是什么,等等。

我们就写一些程序来解决这个问题,满足一下我们的好奇心。

第0步:输出某个英文文本文件中 26 字母出现的频率,由高到低排列,并显示字母出现的百分比,精确到小数点后面两位。

字母频率 = 这个字母出现的次数 / (所有A-Z,a-z字母出现的总数)

如果两个字母出现的频率一样,那么就按照字典序排列。  如果 S 和 T 出现频率都是 10.21%, 那么, S 要排在T 的前面。

第1步:输出单个文件中的前 N 个最常出现的英语单词。

作用:一个用于统计文本文件中的英语单词出现频率。

单词:以英文字母开头,由英文字母和字母数字符号组成的字符串视为一个单词。单词以分隔符分割且不区分大小写。在输出时,所有单词都用小写字符表示。

英文字母:A-Z,a-z

字母数字符号:A-Z,a-z,0-9

第1步:输出单个文件中的前 N 个最常出现的英语单词。

分割符:空格,非字母数字符号 例:good123是一个单词,123good不是一个单词。good,Good和GOOD是同一个单词。

package text428;
import java.text.DecimalFormat;
import java.util.*;
import java.util.regex.*;
public class tongji {
	static DecimalFormat df = new DecimalFormat("0.0000%");
	public static void main(String[] args) {
		String word = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
" + 
				"  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
" + 
				"  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
" + 
				"  Outside,Caught the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
" + 
				"  ";
		String words=word.toLowerCase();
		String reg = "[a-zA-Z]+";
		Pattern p = Pattern.compile(reg);
		Matcher m = p.matcher(words);
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		Integer count = 0;
		while (m.find()) {
			count++;
			String w = m.group();
			if (null == map.get(w)) {
				map.put(w, 1);
			} else {
				int x = map.get(w);
				map.put(w, x + 1);
			}
		}
		System.out.println(count);
		for (Map.Entry<String, Integer> entry:map.entrySet()) {
			System.out.println(entry.getKey()+ ";"+ entry.getValue());
			System.out.println( df.format(entry.getValue()*1.0/count));
			
		}
		
		//System.out.println(map);
		
	}

}

  实验截图:

单词 统计第1张

免责声明:文章转载自《单词 统计》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇ABAP中的AMDP(ABAP-Managed Database Procedures )[linux] /proc/diskstats各列含义介绍以及磁盘使用率计算方式说明下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

谷歌拼音输入法快速切换中英文解决中英文混合打字问题

在用输入法打字的时候,会遇到中英文混合,那么就是要切换中英文状态,比较麻烦。在谷歌拼音输入法中,如何快速切换中英文状态?小编这里有两种方法,都是操作简单的,大家不妨参考下。 一、回车输入英文 第一种方法称为"回车输入英文",即在输入英文后按下回车键(而不是常用的空格键)。 二、V模式 第二种方法称之为"v模式",即在输入英文前先输入字母"v",然后按空格...

Mssql 通配符

通配符 描述 % 包含零个或更多字符的任意字符串。 WHERE title LIKE '%computer%' 将查找处于书名任意位置的包含单词 computer 的所有书名。 _(下划线) 任何单个字符。 WHERE au_fname LIKE '_ean' 将查找以 ean 结尾的所有 4 个字母的名字(Dean、Sean 等)。 [ ] 指定范围 (...

生成6位随机码含字母大小写+数字

/** * Created by Administrator on 2016/11/6. */ public class test2 { public static void main(String[] args)throws IOException{ String str = "ABCDEFGHIJKLMNOPQRSTUVWX...

vim的几个插件mark.vim ctrlp.vim等

开发过程中, 保证语义的前提下, 尽量使用 短的 变量名: 如: 用 $map来代替 $condition , 因为在书写长的变量名的时候, 容易写错, 而排查错误, 还不容易找出来. vim在浏览和排查代码的错误时, 常常需要高亮同一单词或变量, 所以使用 mark.vim. 简单的配置方法是: 下面的反斜杠, 是指的映射键. m 高亮或反高亮一个单词...

CSS字体无法设置成功的问题

在 CSS 中设置字体名称,直接写中文是可以的。但是在文件编码(GB2312、UTF-8 等)不匹配时会产生乱码的错误。xp 系统不支持类似微软雅黑的中文。 方案一:你可以使用英文来替代。比如 font-family:"Microsoft Yahei"。 方案二:在 CSS 直接使用 Unicode 编码来写字体名称可以避免这些错误。使用 Unicode...

开始用巴别小精灵强化英语单词记忆

  以前跟孩子一起玩过乌龙学院,但那款游戏开始收费后,就不再考虑用游戏的方式来学英语,这些看到几个大学的同学在用游戏来背单词,感觉挺有趣,就试了一下。   原来这游戏有偷菜游戏有相同的思路,但偷菜只使人深迷,不给人知识。   而巴别在玩的同时还能记忆单词,与supermemo互补地背单词,确实不错。但要选择好的课程就要收费了。估计沉迷后就要掏银子了。   ...