ORACLE 删除重复的数据

摘要:
select*fromOA_ADDRESS_BOOKbook1wherein2、删除表中重复数据,重复数据是根据多个字段来判断,只留有rowid最小的记录deletefromOA_ADDRESS_BOOKawhereinandrowidnotin3、查询表中重复数据,重复数据是根据多个字段来判断,不包含rowid最小的记录selectname,unit_idfromOA_ADDRESS_BOOKawhereinandrowidnotin1.问题描述BBSCOMMENT表为BBSDETAIL的从表,记录商户评价信息。表结构如下:COMMENT_IDNOTNULLNUMBER  --主键DETAIL_IDNOTNULLNUMBER  --外键,引用BBSDETAIL表COMMENT_BODYNOTNULLVARCHAR2  --评价内容--其它字段忽略其中主键是没有重复的,重复的是DETAIL_ID+COMMENT_BODY+......等信息,就是某些商家的评价信息有重复。
内容转自:https://www.cnblogs.com/zfox2017/p/7676237.html
查询及删除重复记录的SQL语句
1、查找表中多余的重复记录,重复记录是根据单个字段(Id)来判断
select Id from 表 group byId having count(Id) > 1 --(查找表中那个字段是重复的)
select * from 表 where Id in (select Id from 表 group byId having count(Id) > 1) --(根据查出来的重复字段数据,查询相关的所有记录)
2、删除表中多余的重复记录,重复记录是根据单个字段(Id)来判断,只留有rowid最小的记录
DELETE from 表 WHERE (id) IN ( SELECT id FROM 表 GROUP BY id HAVING COUNT(id) > 1) AND ROWID NOT IN (SELECT MIN(ROWID) FROM 表 GROUP BY id HAVING COUNT(*) > 1);
根据重复数据的字段判断,删除掉多余的数据,只保留ROWID (行数)最小的记录
3、查找表中多余的重复记录(多个字段)
select * from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*) > 1)
4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录
delete from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)
5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录
select * from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)

一:重复数据根据单个字段进行判断

1、首先,查询表中多余的数据,由关键字段(name)来查询。

select * from OA_ADDRESS_BOOK where name in (select name from OA_ADDRESS_BOOK group by name having count(name)>1)

2、删除表中重复数据,重复数据是根据单个字段(Name)来判断,只留有rowid最小的记录

delete from OA_ADDRESS_BOOK where (Name) in

(select Name from OA_ADDRESS_BOOK group by Name having count(Name) >1)

and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name having count(Name)>1)

二:重复数据根据多个字段进行判断

1、首先,查询表中重复数据,由关键字段(Name,UNIT_ID)来查询。

select * from OA_ADDRESS_BOOK book1 where (book1.name,book1.unit_id) in
(select book2.name,book2.unit_id from OA_ADDRESS_BOOK book2 group by book2.name,book2.unit_id having count(*)>1)

2、删除表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,只留有rowid最小的记录

delete from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)

3、查询表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,不包含rowid最小的记录
select name,unit_id from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)

1. 问题描述

BBSCOMMENT表为BBSDETAIL的从表,记录商户评价信息。因为数据倒腾来倒腾去的,有很多重复数据。表结构如下:

COMMENT_IDNOT NULLNUMBER  --主键
DETAIL_IDNOT NULLNUMBER  --外键,引用BBSDETAIL表
COMMENT_BODYNOT NULLVARCHAR2(500)  --评价内容

--其它字段忽略

其中主键是没有重复的,重复的是DETAIL_ID+COMMENT_BODY+......等信息,就是某些商家的评价信息有重复。

2. 解决步骤

2.1 查找表中多余的重复记录

复制代码
复制代码
--查询出所有有重复的数据
select DETAIL_ID,COMMENT_BODY,count(*)
from BBSCOMMENT
group by DETAIL_ID,COMMENT_BODY
having count(*)>1
order by DETAIL_ID, COMMENT_BODY; --1955条
复制代码
复制代码

2.2 显示了所有的非冗余的数据

--这一条命令显示了所有的非冗余的数据
select min(COMMENT_ID) as COMMENT_ID,DETAIL_ID,COMMENT_BODY
from BBSCOMMENT
group by DETAIL_ID,COMMENT_BODY;   --21453条,之所以此值不等于表总记录数-1955,是因为1955条记录中,有的重复了不止一次。

2.3 如果记录数量少(千级别),可以把上面的语句做成子查询然后直接删除

复制代码
复制代码
--如果表数据量不是很大(1千条以内),可以把上面的语句做成子查询然后直接删除
delete from BBSCOMMENT where COMMENT_ID not in(
    select min(COMMENT_ID)
    from BBSCOMMENT
    group by DETAIL_ID,COMMENT_BODY
);          --782秒,在我这里,2万条记录,重复记录2千多(太慢了!!)
复制代码
复制代码

2.4 另一种删除方法

复制代码
复制代码
--这条语句也能够实现上述功能,但不好测试了,数据已经被我删除了
--删除条件一:有重复数据的记录;条件二:保留最小rowid的记录。
delete from BBSCOMMENT a
where
    (a.DETAIL_ID,a.COMMENT_BODY) in(select DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*) > 1)
    and rowid not in (select min(rowid) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1);
复制代码
复制代码

2.5 大数据量还是用PL/SQL方便快捷

复制代码
复制代码
declare
--定义存储结构
type bbscomment_type is record
(
    comment_id BBSCOMMENT.COMMENT_ID%type,
    detail_id BBSCOMMENT.DETAIL_ID%type,
    comment_body BBSCOMMENT.COMMENT_BODY%type
);
bbscomment_record bbscomment_type;

--可供比较的变量
v_comment_id BBSCOMMENT.COMMENT_ID%type;
v_detail_id BBSCOMMENT.DETAIL_ID%type;
v_comment_body BBSCOMMENT.COMMENT_BODY%type;

--其它变量
v_batch_size integer := 5000;
v_counter integer := 0;

cursor cur_dupl is
    --取出所有有重复的记录
    select COMMENT_ID, DETAIL_ID, COMMENT_BODY
    from BBSCOMMENT
    where(DETAIL_ID, COMMENT_BODY) in (
        --这些记录有重复
        select DETAIL_ID, COMMENT_BODY
        from BBSCOMMENT
        group by DETAIL_ID, COMMENT_BODY
        having count(*) > 1)
    order by DETAIL_ID, COMMENT_BODY;
begin
    for bbscomment_record in cur_dupl loop
        if v_detail_id is null or (bbscomment_record.detail_id != v_detail_id or nvl(bbscomment_record.comment_body, ' ') != nvl(v_comment_body, ' ')) then
            --首次进入、换记录了,都重新赋值
            v_detail_id := bbscomment_record.detail_id;
            v_comment_body := bbscomment_record.comment_body;
        else
            --其它记录删除
            delete from BBSCOMMENT where COMMENT_ID = bbscomment_record.comment_id;
            v_counter := v_counter + 1;

            if mod(v_counter, v_batch_size) = 0 then
                --每多少条提交一次
                commit;
            end if;
        end if;
    end loop;

    if v_counter > 0 then
        --最后一次提交
        commit;
    end if;

    dbms_output.put_line(to_char(v_counter)||'条记录被删除!');
exception
    when others then
        dbms_output.put_line('sqlerrm-->' ||sqlerrm);
        rollback;
end;

免责声明:文章转载自《ORACLE 删除重复的数据》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇Android 5.0+删除Sdcard文件Hive实现自增列的两种方法下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

16、mybatis学习——mybatis的动态sql之<if>、<where>和<trim>

Student.java:  StudentMapper接口定义方法:  StudentMapper配置文件进行配置 <select id="getStuByIf" resultType="student"> select * from student where <!-- test:判断...

SpringBoot + MyBatis(注解版),常用的SQL方法

一、新建项目及配置 1.1 新建一个SpringBoot项目,并在pom.xml下加入以下代码   <dependency>    <groupId>org.mybatis.spring.boot</groupId> <artifactId>mybatis-spring-boot-starte...

SQL Server常用语句

1. 修改某列的字符集 tinyint,int,double,decimal,datetime都不支持字符集变更 ALTER TABLE TableName ALTER COLUMN ColumnName [varchar](30) COLLATE Chinese_PRC_CS_AS 2.不同字符集间比较 可以只在一列上使用Collate,保证和另一列...

Windows,Linux的select函数功能差异

Windows,Linux的select函数功能差异 感谢主,Windows当年也实现了select函数,这让我们的跨平台大业至少顺畅了一节。但由于Windows渗入骨髓的叛逆心理,他总要和UNIX的实现保持一些差别,让你无可奈何。首先是Windows的select函数的参数接口设计和Linux下有较大差别,这个在我的《设计极其糟糕的select函数》就讨...

exec和临时表

exec中的语句能够访问主方法中创建的临时表 如: select 1 as a into #ttexec('select * from #tt') 能正确返回。 但如果在exec中创建了临时表,在主程序中是访问不到的 exec('select 1 as a into #tt') select * from #tt 将会报错,找不到#tt 也就是说在调用的子...

mybatis的嵌套查询(嵌套查询nested select和嵌套结果nested results查询)区别

(转自:http://blog.csdn.net/canot/article/details/51485955) Mybatis表现关联关系比hibernate简单,没有分那么细致one-to-many、many-to-one、one-to-one。而是只有两种association(一)、collection(多),表现很简洁。下面通过一个实例,来展示...