Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » MySQL, WordPress

Remove Duplicate Posts in WordPress

Submitted by on December 5, 2009 – 10:28 pm 12 Comments

Below is the SQL script that will attempt to identify and remove duplicate posts in your WordPress database. This script can be useful for autoblogging. If you use plugins like WP-o-Matic to pull full-text RSS feeds into your database, you will inevitably end up with a bunch of duplicate articles. This is not good as many search engines – including Google – frown upon duplicate content and may decide not to index much of your site.

Edit the script and add your database information. You may need to change “wp_posts” to “_posts”. Before using the script, backup your database. There is no “undo” button in the script. If you are happy with the result, set up a cron job to run the script twice every day. Keep in mind that the script is looking for duplicate post titles. It does not look at the actual post content. Do not use this script if you purposefully have different posts with the same title.

#!/bin/ksh

MYSQL=/usr/bin/mysql
DBUSER=your_db_username
DBPASS=your_db_password
DBNAME=db_name

$MYSQL -u$DBUSER -p$DBPASS $DBNAME < 1
        ) as good_rows on good_rows.post_title = bad_rows.post_title
        and good_rows.min_id  bad_rows.id ;
EOF
Print Friendly, PDF & Email

12 Comments »

  • Sleeper Sofa says:

    well, i have troubles installing windows7 on my PC. maybe i need a bios update or something ~”*

  • Jamie says:

    My RSS feeds create the duplicated titles for posts, but the date they are published differs. I need something that will look at the post title AND the date that title was published. If the title and date match another title and date, then I will know that it is a duplicate post.

    Can anyone help? I need something free because I am making this site for work and am not allowed to purchase things for it.

  • abhi says:

    nice tweak dude it is helped a lot to clean up my local word press database keep up the good work.

  • Smith says:

    Hey thanks for this method. But I tried it on my wordpress site. My database of posts is huge, I mean very huge. It took me few minutes to get rid of all the duplicate content using the method you suggested above. I found a pretty neat plugin for doing the same work: http://www.l337fx.com/delete-duplicate-posts-wordpress.html

  • JackReynolds says:

    Short description of problem:
    I have a blog hosted on Blogger but published with a custom domain for over a year now. It used to show up on Google search results on the first page (web + images) but has completely disappeared from Google search listings for the last couple of months. Other blogs/sites that have referred to my blog though, do show up in the search results (web + images) against the very search strings that were earlier pulling up my blog. However, in Yahoo!/Bing my blog still shows up. I haven’t really changed anything, so not sure what the problem is.

    Blog Address: http://www.apycooking.com (hosted on Blogger as apycooking.blogspot.com)
    Browser(s) Name/Version (ex: Firefox 4, Internet Explorer 8): All/any
    Geographical Location (ex: San Francisco / USA): USA

    Long description of problem:

    My blog – apycooking.blogspot.com – is published as http://www.apycooking.com with the necessary redirections, including redirecting apycooking.com to http://www.apycooking.com. Everything was working fine and my blog used to appear on the 1st page for search results for the relevant search terms. I started my blog more than a year ago and it’s been about a year since I moved to the custom domain (www.apycooking.com).

    About a couple of months ago, my blog completely stopped appearing on Google search results. Using a advanced search with “site:www.apycooking.com” gets results, but using “site:apycooking.blogspot.com” yields nothing. At times, some images from my blog do appear in the Image search. In both cases, other sites/blogs that refer to or mention my blog do show up in the search results.

    Searching for the same keywords on Yahoo!/Bing though, does pull up my blog in the search results and often with a high ranking.

    As a test, what I did was to copy my blog from Blogger to WordPress – apycooking.wordpress.com. I find that the WP site is showing up now in Google search results – though not with the same ranking that I was earlier getting for my blog; which is understandable because the WP one is not really an active site and does not have other sites linking to it – but the same content is showing up nevertheless.

    In conclusion, it is clear that something is wrong with my Blogger blog that is published with a custom domain AS FAR AS GOOGLE IS CONCERNED (since the same blog shows up on Yahoo!), but I am not sure what. Since it is on Blogger – which is a Google product – I can safely rule out (I assume) structural aspects of the HTML that are preventing the content from being indexed. The only other thing I can think of is something to do with my Domain Registrar’s Control Panel, but am not sure what I should look for there.

    As an illustration, here are some search terms tested (without the quotes) as of today – Sep. 14, 2011:
    – “Garam masala pizza” => Yahoo! – 1st search result; Google – end of 2nd page but show another blog that refers to this recipe
    – “Kali tori ghashi” (name of an ethnic dish) => Yahoo! – 1st search result; Google – not appearing in first 2 pages
    – “Honey Chilli Sesame Potato” => Yahoo! – 3rd result; Google – not appearing in first 2 pages
    – “Eggless Microwave Coffee Cake” => Yahoo! – 1st result; Google – not appearing in first 2 pages
    – “Eggless Microwave Walnut Cake” => Yahoo! – 1st result; Google – not appearing in first 2 pages.

    Look forward to some expert inputs.

    Thanks in advance!
    Thank you, The Wolf for your inputs and Smelly Cat, for your excellent suggestions. I have now fixed the apycooking.com problem – I thought I had fixed that long back and didn’t even check that this time. I have now blocked the WP link but http://www.apycooking.com is still not showing up in the search results. It’s been about 72 hours now since I fixed that; maybe another 24-48 hours before all the indexes refresh.

  • morbiusdog says:

    The company I work for has a blog. One is attached to the website (wordpress), and the other is on Google Blogger. We have been posting the same exact material on both since people preferred both formats.

    Is this hurting our SEO by having duplicate content?

    If so and I should remove one, which should it be?

    Thank you for your time!

  • jag43216 says:

    Hi,

    I’m using WordPress with All-in-One SEO plugin installed. I have several category url’s already indexed by Google. I would like to de-index them because I believe they are causing me duplicate content. I’ve tried checking the ‘use noindex for categories’ and also for the tags.

    So my question is, will google automatically deindex those url’s? or do I have to do something else, like request them removed in webmaster tools?

    Thanks in advance!

  • Melanie says:

    Ok so I made a website and I also signed up with wordpress. The website takes 24-48 Hours to finish setting up completely, and I will have to build it myself. I know that will take some time so I decided to explore wordpress. I am also very new to all this and still in the learning process of course.
    I want to know how to attract traffic to my wordpress site? And is it only wordpress members that can follow and comment on the blogs?

    Also, about my website that I purchased….
    Xquisitz.com-will be a remedy site that shares information about life and everything that comes with it.
    Like for instance I suffer from sever migraine auras and I am still in the process of finding triggers and medications. I need to lose 60 Pounds for the New Year. That is my 2012 New Year’s resolution. And I have a lot of things to share and I want others to share with me too. So maybe I could have a discussion bar on my site as well. But will I be able to do that with wordpress too? Please any ideas will be appreciated. Thanxx.

  • Thomas A says:

    http://freerepublic.com/focus/f-news/2101148/posts
    http://citizenwells.wordpress.com/2008/10/09/acorn-patrick-fitzgerald-obama-fbi-investigation-rico-violations-228-million-undocumented-campaign-contributions-mccain-campaign-fec-complaint-clinton-campaign-complaints-obama-fraud/
    If Hillary sent you, then why is she suing along with McCain?
    Look at the date, this is BRAND NEW
    Pig skin, I know all about the Texas caucus frauds, I live in Texas, and am a Hillary supporter, and member of PUMA.
    Pig skin, I know all about the Texas caucus frauds, I live in Texas, and am a Hillary supporter, and member of PUMA.

  • lcollier93sbcglobalnet says:

    As we know search engine like Google penalize website because of duplicate content. So If I post my blog in other blog directories then this will put any adverse effect on my campaign.

  • simply complicated says:

    I don’t have $1,000/month to pay an SEO firm so i decided to learn it myself. I have a niche website (engineering CAD video training, I’m an engineer) and I’m serious about getting it to take off. I paid an expert to do research, there is a market and competition isn’t too bad. From my research, the best strategy seems to be to write articles on my wordpress blog using the seo plugin that gives me a green circle if the article is optimized and submit it to ezine.com with a backlink. I have written a few articles for specific keywords using google keyword tool with low competition and decent traffic but not seen much results. What should I do, keep writing and hope I’ll see results soon or do I need to do something differently? How many articles per week should I write?

    Thanks a lot for the help!
    Solidworkszen.com

  • Rob Turner says:

    Hello,

    I have an RSS feed hub site and the plugin I was using to pull RSS content barfed and I went from 7000 posts to 59000 and a very large database.

    Your script with some changing of the path to mysql, adding a dbhost as my provider does not use localhost and changing to the db prefix wp_ as my database does not use it worked like a charm and got my database streamlined.

    I then used a plugin to see if there were still duplicates, it found a few and I trimmed them, I then have optimise database after deleting revisions plugin that said the database went from from 300MB to 90MB, I put your script not wo be a cron that runs at 3am and then the optimise plugin will run with cron keeping things all snug.

    A big thank you as all the plugins I was trying to use really were not working. This script might take my site down with an unable to connect to database the few minutes it takes to run but I am okay with that.

    If something can be done to stop the table locking which I presume is what is happening then I would be happy to know about that but you really helped me and now my RSS feed shopping search engine should be pretty well self maintaining!

    W00t! So Happy.

    Rob Turner
    http://www.monkeyshoppa.com

    I will be using it for http://www.crazyasabagofhammers.com and http://www.eternalsummertime.ca, the latter beign another rss fed site that is suffering from duplicated posts.

Leave a Reply

%d bloggers like this: