Date aired: Jul 28, 2016
Sucuri Marketing Analytics & SEO Specialist, Alycia Mitchell, teaches you how to keep your Google Analytics data clean and use it to uncover vulnerabilities and compromises while optimizing your website’s visibility.
Alycia Mitchell
Digital Marketing Manager
Alycia is the Digital Marketing Manager at Sucuri. She’s passionate about teaching cyber security best practices and fond of open-source, analytics, and malware. A nature and wilderness lover, she has deduced that they are strangely enough a lot like the internet.
Question #1: What is the benefit to spammers for using someone elses UA code and is there a way to hide it?
Answer:I haven’t found a way to hide it. I actually did some research in advance of this webinar to see if we could do that. You can put it in like an analytics.php file that you include, but people will still be able to find that stuff, too. The thing is, you don’t want to block Google Analytics from being able to send that data.
As far why they do it, there’s some speculation, but I think probably, for me, the most common reason is that they’re trying to spam marketers who are using Google Analytics and get them to check out these sites. Other times, it’s just people want to watch the world burn. There’s just evil people out there who want to invalidate your data. Very rarely would I say it’s a targeted thing where they’re trying to pollute your analytics because it’s, say, a competitor or something like that. More often than not, it’s just spammers who, like I said, take ten minutes to write a script and they can send their website to an audience of millions of marketers who use Google Analytics.
Question #2:Should we be concerned with the host name not provided? Is that something that should be a major concern?
Answer:Not if you set up that filter. That might be measurement protocol stuff. I wouldn’t worry about it too much. I mean, if it’s not provided, then it’s not coming from your website. Don’t worry about it. Set the include filter to include the host names that you want and just leave the other ones alone because, unless you have some serious weird security stuff on your site, your host name will show up in your reports from the valid data that you’re trying to send.
Question #3: Do filters work retroactively?
Answer:No. So, filters are applied once you apply them to your view, all the data going forward, is changed. Which is why I said you should probably set an annotation so you know when you set it up. Annotations are a nice little way in Google Analytics to mark certain things like spikes and that sort of thing. You can put a little bubble there on the date that you made the change.
Filters, once you apply that, basically your data as it’s sampled from Google Analytics, it get processed and it goes through your filters. Once you apply the filter, everything forward will change from that moment. Segments are how you are able to look at the past data
Question #4: What all can be customized in a view?
Answer:Oh, tons of things. All you have to do is just go to your views in Google Analytics and look under the column. Custom Alerts is one of them, there’s Goals and that kind of thing, too, and events I think that you can set as well.
Yeah, all of it’s available under the view column and anything that you change under the view column will apply to the specific view. The main one, though, that you should be concerned with that I recommend everybody looks into is Filter. I just Google like “Top 10 filters using Google Analytics” “Top 5”. There’s tons of other people who are analytics experts who’ve set up some really great guides on how to use those.
Question #5: Do you find that it’s easier to include a filter rather than excluding the host names you don’t want?
Answer:Definitely. I mean, like I said, one of the problems there is if you add more properties, more websites that you want to track, then you want to have a back-up view that doesn’t have a host name filter on it, but maybe has the other filters that you do want. If you’re excluding those bad host names, that just means that if a new website shows up as a bad host name, you’re gonna have to go back to that filter and exclude that one as well and make another segment because now that data is in there. You can’t remove it, but going forward, it won’t be processed anymore
I definitely find that it’s easier just to include the websites that you know. I mean, it depends, too. Like if you have, I think there’s a max of like 50 properties or something, so if you have a lot of properties, it might be a lot of work to do at Regex. Fortunately, those filters allow for enough room for that and you can create multiple filters as well.
Like I said, the referral website that I was mentioning that stops the bad referrers, they set up multiple filters because there’s only so much room. Yeah, generally, if you have a handful of websites that you’re processing, it’s easier just to include those ones in your data and then that way no other ghost referrers will ever show up for you.
Question #6: This person had a question that they’re using Yoast CEO premium and that has the search console info that Google has, but they want to know if they should depend just on Yoast or look at Google also.
Answer:Basically Yoast is, as far as I’m aware, Yoast is hooking up with Search Console because Search Console, actually, provides you with some queries that people are using to find your website. Yoast will pull that information into its plugin. I’m not sure if Yoast has a security feature in order to let you know if you’re blocklisted or something. We do have a free wordpress security plugin as well, which will scan your site and let you know if you’re blocklisted, which I highly recommend that anybody with wordpress installs.
Yeah, I would definitely recommend. It’s worth getting to know Google Search Console and just clicking through. There’s a lot of interesting stuff in there. I mean, I’m just a data geek, but I think it’s really helpful and especially the search query section is really awesome. They improved it over the last year, too, so that you can better filter date ranges and see what queries are being clicked the most to send people to your site. That data is very valuable, I find.
Question #7: What’s your best practice for removing post hack malicious 404 not found links from the webmasters?
Answer:I would use that Google Search Console URL removal tool if it’s just a few of them. If you’ve got a ton of them, you can use a robot.txt file. What robot style text is, if you’re not familiar, is it’s just a file, a text file, on your server that bots have to respect. Especially the good bots like Google Bot. When Google Bot’s trying to hit your site and it’s like, “Oh, let’s just crawl this whole site.” First it reads the robot.txt file and finds out if there’s any places you don’t want it to go.If you have a bunch of 404 spam in a directory like, maybe, the hackers made a directory that’s like, they just smashed the keyboard and made a directory and then put like 10,000 pages in there. Now they’re all gone. Instead of submitting those one by one, 10,000 times, you can just tell Google Bot, “Don’t go into this directory. Just forget, it’s not there. Don’t index it.”
That’s probably, I would say, the easiest way if there’s a lot of it. If it’s just a few URLs, the URL removal tool, for me, is probably easier because I don’t have access to our server.
Question #8: This person said that they heard that Google said they don’t mind the 404s and they don’t affect the ratings at all. Is that what you’ve heard as well?
Answer:I think it depends. I mean, SEO is such a toss up sometimes. There’s a lot of mystery to it. I’ve heard some people say that if you have a lot of 404 errors that Google doesn’t like that. It really depends. I mean, they show up as crawl errors. I know that places like Mauz definitely recommend that you resolve your crawl errors. Again, people have tested both ways and some people say that 404 errors do effect search results. Especially if they’re in large numbers. Some people say they don’t.
One thing that I did find and we’ve written a blog post about it as well, if you search for 404 errors in Google Search Console you’ll see Caesars post about a site that had multiple 404 errors and because there were so many 404 errors on the site … No, it was like so many pages that were created. It was like 250,000 pages or something. Google starts to think your site is much bigger so it crawls it much faster and then when those pages just disappear, the crawl rate is totally out of sinc. It can, actually, DDos your website.
That was kind of an interesting one that we looked into, but definitely, I would say, it’s always beneficial to get rid of 404 errors just because they’re not good to have. Especially if people are actually trying to visit those pages, then you want to resolve them. It’s a bad user experience if they’re legitimate 404 errors, not spam.
See all Questions & Answers
ExpandAlycia Mitchell – Digital Marketing Manager
Thanks very much, Kristen. Welcome everybody. I’m really excited to talk to you guys today about how you can use Google Analytics and Google Search Console to identify website compromises and, also, tackle some security issues and make sure your reports are free of any spam. A little bit about me, I live in Victoria, which is the capital of British Columbia on the west coast of Canada. It’s on a little island called Vancouver Island. It’s a really beautiful tourist spot and if you ever get a chance to visit, it’s a lovely place to be.
I have been working is Cybersecurity, specifically, doing Marketing Communications for about seven years. I’ve studied in a few fields and continue ongoing learning in that area. Enough about what I do, I’ll show you my dog. She’s the light of my life and she gets me away from the keyboard. I spend way too much time on the computer. I just thought some pictures of my dog would be a nice way to start this before we get into some more heavy content about security.
What we’ll be going over. There’ll be three sections to this webinar. The first one we’re going to talk about is how to remove the two types of Google Analytics Spam. There’s Ghost Refers and there’s also Crawler Spam or Bot Spam. We’ll go over how to identify those, how people are spamming your Google Analytics and, finally, how you can remove them.
The second session of the webinar will be talking about Search Console and how you can find some security issues and repair them. Including things like blocklisting, SEO spam and 404 errors generated by spam. We’re gonna talk, in the last section of the webinar, about identifying indicators of compromise in your Google Analytics so that you can make sure that you’re protected. If there is anybody trying to attack your site, that you’re aware of that. Let’s jump right in.
First, we’re gonna talk about that Google Analytics Spam and, as I mentioned, there are two types of Google Analytics Spam that people refer to. There is the Bot or Crawler refers and then there’s the Ghost refers. Although, they don’t really use refers in the Ghost ones, but we’ll go over what it means anyway.
Referral spam has been a big issue lately in the past year. You can see from this Google Trends screenshot that in about the early 2015, it spiked. A lot of people started searching for these terms. It has gone down a little bit recently because Google’s taken some action against it, but it’s an ongoing problem because spammers can continue to just buy new domains and spam your Analytics with those. It is something good to be on top of. It definitely is a big issue for invalidate in your data.
One of the other problems is you look back on your old data and Google can’t really do anything to remove those spam refers from your old data. I’ll show you how to do that as well in this webinar.
The first thing we’ll talk about is: How is Google Analytics getting spammed in the first place? One way is your tracking code, as you can see on the right. You’re probably all familiar with this. Everybody has their own unique UA code, which sends the page you hits to Google Analytics. One of the problems with having your own UA code for your account is there’s only so many possible UA codes. It’s nine digits thereafter the UA. People can randomize them, people can generate them. It’s also easy for somebody to target you because all they need to do is open your source code and find your tracking code to find your specific UA code. From there, they can target you and send tons of hits to your account.
How do spammers send invalid data? There’s a couple of different scenarios that I’ll go through. A couple of them are using your UA code and then a couple of them are just using the Crawlers for the Bot spam that I was mentioning before.
In the first scenario, attackers can basically set up a website and install your tracking code. All they need to do is send hits to their website and your tracking code is what fires and it ends up showing up on your data. There’s not a lot you can do about this because your website is not ever touched in a process. There’s nothing you can do at the website level to protect yourself. A website firewall won’t stop it because it’s all happening in somebody elses website. Those hits will end up sampled in your report. However, attackers don’t even need to go through the trouble of setting up a website. There’s other ways that they can do this too.
If you haven’t heard of the Measurement Protocol, it’s basically how Google Analytics responds to the internet of things. This is things like your internet enabled fridge or your microwave and it allows developers to, basically, make raw http requests from any environment and send date directly to Google Analytics. They can send anything like eCommerce and events and all kinds of data from any device.
What attackers will do is they’ll use a script written with the Measurement Protocol and, as you can see in the right here, this is actually a screenshot of the Google Hit Builder. Some of you marketers out there have probably used Google Campaign URL Builder. It’s very similar to that. It’s well documented, there’s all kinds of hits and payloads that you can send with it in order to build a hit and send it to a specific UA code. That ends up getting collected by Google Analytics and sampled in your reports. It’s really fast, really easy to automate. Like I said, attackers can basically send any payload they want. Then, basically, it just ends up in your data. It includes events and eCommerce and they can spoof a lot of different things in that way.
One of my colleagues on the marketing team is a little bit of a script kitty himself so he tried this out. He actually told me he took about ten minutes to write a script and in under and hour with his small web server, he was able to send over five million fake hits to various Google Analytics accounts. He did this with one line of code and he was able to send up to 500,000 hits per minute. If he had a bigger server and more resources, he could have sent a lot more. He was able to hit every single Google Analytics account a couple of times.
This just goes to show how big the issue is and how easy it is for spammers to send this fake data to corrupt your Analytics. However, these two scenarios, like I said, are using your UA code and sometimes they don’t even need to do that.
In the third scenario we’re gonna talk about the Bot Referral Spam or Crawler Referral Spam. This is the one most people are familiar with. To do this, basically, attackers are spoofing http request headers. If you’re not sure what those are, basically anytime somebody visits your website, your server receives http request headers from the visitor. This includes information like where they’re located in the world, what browser, what operating system they’re using. The referral header is actually what website they were on before visiting you. All that information is actually collected by your tracking code in Google Analytics and sent to Google Analytics and they use that for things like their acquisition reports and tons of other things throughout Google Analytics. All come from those request headers.
Basically, attackers are programming a Crawler or a Bot to scan through the internet and probe for websites and, basically, send these hits to pages on your website. Lots of them so that they show up in your referral reports. Another way that they might do it is they might actually command a Bot Net. This would be a series of infected computers or web servers that they’re using like a zombie bot net in order to send hits to your Google Analytics. These are people who were infected with a virus and don’t realize that their computer resources are being used in this way.
One of the reasons why these attackers might use a Bot Net over scripting is because they’re actually real people, so it’s a lot harder to be detected. They can evade detection that way. If you’re programming a Crawler or Bot, you might have a limited range of IP addresses or user agents that you use and so it makes it a little bit easier to block. Plus, you don’t really want to be blocking real people from visiting your website.
What they can do with the request headers, as well, is that they can spoof them so they can change the referrer to whatever website they want. What spammers are trying to do is, basically, get marketers to look at their referral reports and see, “Oh, this website’s sending me a lot links.” In actuality, it’s just a spam website that’s been spoofed and sent a bunch of hits to your site.
How do you find out if you have Ghost Referral Spam or Bot Referral Spam in your reports? For Ghost Referral Spam, as we mentioned, these people are using your UA code. All the websites that you have your tracking code on, those will show up as unique host names in your report. So, blog.example.com, www.example.com, shop.example.com. Those will all be unique host names in your reports. Anybody using your UA code on a website or device that’s not belonging to you will show up as a different host name. You can basically just set up some filters to make sure that you’re only including the data from the websites or host names that you want.
This resolves the first two scenarios we talked about. This is, again, Ghost Referral Spam, so it’s different from Bot Referral Spam because, in this case, we’re gonna be just including the websites that you want. These host names do show up as well as the dimension in Google Analytics, so you can also use them to apply to different reports and that kind of thing as well.
To find the Ghost Host names, what you can do is look in your Reporting tab. Go to the Audience section, under Technology you’ll choose Network and then from here, you’ll see at the top right above the red box, you’re primary dimension is default to Service Provider. You just click Host Name and that’ll show you a list of all the host names. You can look for any domains in here that you don’t have or that don’t belong to you. Those will be your Ghost referrers.
This is what it looks like. As you can see, the top eight sites are all ours, but this ninth one is actually a Ghost referrer. This is somebody who used our UA code and tried to spam us and sent us about a thousand hits over time with that.
The last one, number ten, is actually just Google Translate. If you want to, you can exclude it, but it’s not really a big deal. It’s not malicious or anything like that. That’s where you’ll find them, like I said, because it’s so easy to identify, this one is a really nice and easy one to deal with. Spam or Crawler referral, in the other hand, is a little bit more tricky.
Referrers, again just to recap, are sites where visitors clicked a link to get to your site. That http request header sent along that information. All these request headers, with the referral data, they make up your channel reports in Google Analytics. Google Analytics is pretty smart. It knows that if it’s twitter or Facebook, it’s a social channel. It knows that if it’s Google or bing or Yahoo, it’s an organic channel. It can identify email channels, that kind of thing. Any website that is not falling into those buckets is going to show up as a referrer. Any site that you see in your referral reports under acquisition that looks fishy or spammy, could possibly be a spam Crawler or referral Bot that is hitting your data and polluting your reports.
That’s that third scenario. Again, this one can be really difficult because there are tons and tons of back referral sits out there that are using this technique in order to spam your reports. There’s lots of lists of referral spam sites that I’ll go over in a second here. It can be really tedious because it’s just as simple as buying a new domain name and using it in this way to spoof referral headers. Then the next thing you know, there’s a ton more of them out there.
In order to find the referral spam, you go to your Reporting tab this time and under Acquisition, go to Referrals. Again, here you just might have to show more rows and look for any sites that are sending you weird traffic. The metrics and stuff associated with it are all gonna pollute the rest of your report. That’s why it’s important to get rid of these. It’s not valid data and you really don’t want that in your reports messing up things like your time on page, your balance rate, whatever. Sometimes it looks like really bad traffic, sometimes it looks like really good traffic. It really can be anything. One of the best ways to find this stuff is to use those lists that I’m gonna talk about.
This is the juicy part. How do you remove all this invalid data? If you haven’t used segments and filters in Google Analytics before, I’ll go over them a little bit, but definitely recommend that you read up on them. Filters are something you use to change future data. When you apply a filter, it’s gonna modify all the data going forward in that view.
To remove the Ghost referral spam, we’re basically gonna set up a filter that only includes the host names that you want. Your websites, nothing else is gonna get into your data from the moment that you apply that filter.
To get rid of the Crawler spam, we’re gonna do something different. We’re gonna actually exclude those Bot Crawler spam referrers. That’s why you need the lists. That’s why you’re gonna need to keep on top of it and update them as well. Again, once you apply filter, modifies all data going forward so it’s very important to test. Once you set the filter on your main view, I recommend that you add an annotation in Google Analytics. That’s just to mark the date that you actually made the change so if you’re going back to look at past data, you know when to apply a segment to look at the past data and when your future data’s gonna be clean from that stuff.
In order to look at past data without that referral spam, like I mentioned, Google can’t do much about all the past data that’s been polluted. You want to create a segment and this will be of your valid host names and you’ll also create a segment that excludes all those spam referrers. This will allow you to apply it to any report on Google Analytics just to see how your data changes when you remove all of that spam.
Just to recap. With Google Analytics you’ve got your account, your property and your view. Your account’s usually owned by your whole company and your property, maybe, different sub-domains. Your main site, your blog. You might have a property that includes all of them with cross-domain tracking. For every property that you have, you have up to 25 views that you can apply to that property in order to see it in a different way. One of the most popular things to do with your views is to add those filters.
There’s lots of different filters out there. Some really important ones, too, like adding the request URL in the beginning. Instead of just having a path name, you also have the sub-domain and the domain in front when you’re looking at your reports. Lowercase filters to make sure if somebody uses all caps that it actually just filters the data before it enters in your reports and makes it all lowercase or removing a trilling slash. I’m not gonna go over those, but definitely something to look into if you’re new to filters.
Again, views allow you to change the data with those filters and I highly, highly recommend, actually, I insist that you use a test view anytime that you apply a new filter because it is going to really change your data. You can’t go back and see your view without that filter once you’ve applied it. That’s why it’s also important to keep a couple raw views that are completely unfiltered.
I have, actually, a couple back-up ones as well that have basic filters like those ones I’ve mentioned. Lowercase and adding the request URI. I have them set up with goals as well, but I don’t have the host name filter on those because if we ever added a new website and I missed out on it. For example, we have hub-spot renting pages and that comes from hs-sites.com. Those weren’t being included in my main views because I had only included our valid host names. That one got added, but I was able to go back into my back-up views. I mean, you have 25 so might as well use them. Again, have to stress, use a test view anytime that you’re gonna add some filters. Make sure it’s working and then go back and add it to your main views once you know it’s good.
In order to set up a filter to get rid of those Ghost host names, you want to go to the Admin section. Under your view that you want to change, you can click Filters and from there go to New Filter and here we’re gonna create a custom one. You can see on the far right there that’s what the filter creation tool looks like. You could name it. We want to make a custom one. It’s gonna be an include filter and we’re gonna use the filter field host name and then we’re gonna enter our website.
If you just have one website for the view, you can just enter it as is. If you got multiple ones like in this case, from my example there in green, you’re gonna want to use Regex and Regex allows you to basically tell Google Analytics, “Use this one or this one or this one.” There’s a lot of different opinions on what Regex you should use here, but this is the one that I found is most effective for me.
Just to explain how it works there. That little carrot symbol, the little hat thing at the beginning, that says, “Start here.” And the dollar sign says, “End here.” And then the pipe symbol is an ore operator so this basically means use www.site.com or blog.site.come or www.etcetera.com and you can continue adding those as long as you still have the pipes in between. This is the pattern that you’d want to use for that filter and once you save it, again, it’s going to start modifying all the data that’s sampled for that view going forward.
In order to make sure that your past data you can view without the spam, we’re gonna create a segment. This is actually done on the Recording tab and you’ve probably seen that all sessions little circular icon up at the top near your toolbar. Right next to it you can click Add Segment, click the New Segment button and we’re gonna go to the Advance section and go to Conditions. For that, we’re gonna actually create a filter here for the segment that is a session filter to include only the host names that you want. Sessions include and then you choose and that little tool Host Name Contains and then use your site names. You don’t have to use Regex here, thankfully, so you can just click the End button and Add More Host Names for any site that you want to be able to view and that you want to see in your reports because they’re actually the ones that you installed the tracking code on. When you apply this segment to any report, it’ll remove any host names that aren’t in this list.
Like I said, it’s a little bit different when we’re doing the Crawler spam. It’s the same basic process, you’re gonna create a filter and a segment, but this time instead of it including only the good referrers, we’re gonna exclude the bad ones. You wouldn’t want to just include good referrers because there’s probably lots of websites sending you traffic and you don’t want to miss out on any new ones that are coming up. Those lists of bad ones are gonna keep growing so it’s really useful to do some research and find them. There’s a ton of lists on GitHub and stuff like that.
Recently, I’ve tested out a tool called ReferrerSpamBlocker.com. I think it’s fairly new. I definitely recommend it. It’s pretty awesome. It’s gonna allow you to just click and import segments and filters. You just have to allow it access to your Google Analytics. That’s really nice because it’ll set up I think it’s like 16 filters for you and one segment with a pile of Regex in it. That will get you started so you can see kind of what there is out there. They have, I think they said about 315, that they’re already tracking. To me, that’s a pretty low estimate. There’s probably a lot more out there and in the list that I’ve seen, there definitely is.
Your specific account may be targeted and being hit with one anyways. It’s always a good idea to know how this is done and how it works. Once again, if you’re using referrerspamblocker.com or any lists, use a test view first. You absolutely want to make sure that it’s working for you before you change your main views that you use for day to day.
That’s it for taking care of the Ghost referrers, which you want to make sure you just include the host names that you want and the Bot referrers that you want to exclude from your reports. Now, we’re just gonna talk a little bit about Search Console. If you don’t have Search Console, highly recommend you get it. It’s awesome. It’s the best way to use Google if you’re doing any stuff with search engines.
In Search Console, we’re gonna talk about how to deal with Blocklists, some Crawlers like 404s from scam campaigns and, also, how to deal with SEO spam if your titles and descriptions are changed in the search engine results pages.
First, we’ll deal with the biggest baddest one. You’ve probably all seen this big red warning. This is the Google Blocklist and it’s a great way to lose 95% of your traffic. If your site is hacked or infected with malware or spam, the Google web spam team takes it extremely seriously. The last thing that they want is to be serving up search results to people that end up getting them infected because that’s a good way for people to start going and using other search engines.
The Google web spam team will Blocklist you if they find spam or malware on your site that may endanger its users. It will also label your search results as “hacked” so when people are searching for your site, you’ll see that little, “This site may be hacked.” And people don’t like that. They’re not gonna go there. Very few people click to proceed to the website after they see this big red warning. Something that you definitely want to be on top of and know about.
Once your site is clean and you’ve removed the malware. You’ve removed any back doors that the attackers left there to get back in and you’ve also plugged the vulnerability that allowed them to hack your site in the first place. Once you do that, then you can request a review. When you do that, usually takes an absolute minimum of like a day. At most, they say it can take up to several weeks. Again, it depends on how big your site is, how bad the hack was, how much work the web spam team has to do to verify that your site is clean and also making sure that you are requesting your review because it’s actually clean, not just because you want it removed. If you keep doing that, it’s probably gonna take them a little bit longer the second time that you request it.
This is what it looks like in Search Console. You can see there on the left side I’m in the Security Issues section. This is our tool site check, which if you’re not familiar with it, it’s a free tool that we offer to website owners so you can scan your website to see if it’s infected. We actually show the payloads and stuff when you scan it and those get detected by the web spam team’s automatic crawler there and they see it here.
We, fortunately, have a good reputation and a good relationship with Google, so we don’t get Blocklisted for it, but they do show up here as a warning. It is good to be familiar with this section because they will give you warnings, potentially, if it’s not a terrible infection or something that you might just need to look into.
Once again, you have removed all the spam, the back doors, plugged all the holes, then you can click the check box at the bottom. It says, “I fixed these issues and request a review.” When you do that, it’s gonna pop up with this little text box where you can say how exactly you fixed it. You can see here, they say the process may take several weeks. Although, we usually find that they get back to you in, at least, 48 hours.
That’s how you deal with Blocklisting. It’s good to know. One thing that we’ve seen before is 404 errors in Search Console after removing spam. Attackers will attack your site and they will put tons and tons of doorway pages and spam directories on your site. Basically using your server in order to surf pages of spam, fishing pages, all that kind of stuff. If you remove that stuff, Google’s already crawled an index on your site, Google’s gonna think that you now have 404 errors. They’ll think they’re actually really missing.
You find those 404 errors under Crawl Errors, under Not Found. It’s different than a soft 404 and Google doesn’t like it when you have a lot of 404 errors so it’s definitely something you want to clean up and fix. You can use the Google URL removal tool, which I’ll show you in a minute. Which is right under Google Index and Remove URLs. You’re gonna click the Temporarily Hide button and enter the URLs that are 404ing.
If you’re like one of the posts that our researcher, Caesar, had written on the blog, he had found that there was like 25,000 pages or something and a bunch of directories that had some Japanese spam in them. Very difficult to do these one by one. If you have something like that, then I would recommend, instead, using a robot.txt solution to tell Google bot just stop crawling those spam directories.
This is what that removal tool looks like. In Search Console, we can see here under the Google Index section review URLs. Just click that Temporarily Hide button and you can enter the URL here. Once you do that it’ll just show up and it’ll basically let Google know that it’s no longer there.
Before we talk about SEO spam, this is what it looks like. If you scan your website with Site Check, our tool at sitecheck.sucuri.net, we check not only for SEO spam, we check for blocklisting, outdated software, code anomalies and known malicious payloads. It is a remote scanner. We don’t have access to your server so there are some limitations, but it is definitely recommended by a lot of people in the website community. It’s highly used and it’s a great quick way to see if you think your website’s hacked. What’s going on there. To just scan for any anomalies or issues. We always try to show the payload and stuff to help make it easier for you to clean up on your own as well. That’s what it looks like if you see that you’re hacked with spam using Site Check.
Basically, SEO spam, what it does is it infects your titles and descriptions. These are the things that Google mainly uses to help rank your site and know what your site is about. If they change because an attacker got access to your website, then that’s gonna really impact your search position. It could impact what visitors are seeing in search results. Usually the spam is really unsightly. We often see pharma spam, so advertising things like Viagra and Cialis. You really don’t want that on your website.
If your site is infected or you think your site is infected, definitely Google yourself and see if your search results are modified at all. You can use like a site operator in Google to just search for just your site, so site: and then your website. That will bring up all the pages on your site. We can just make sure that none of them are infected with spam. If you’re not sure what I’m talking about with titles and descriptions, definitely check out our friends at Yoast and wpbeginner because they have some awesome free guides and tutorials for people who are learning more about these topics. This is what SEO spam looks like. We see fashion spam really often as well. That’s what you see on the right there. Basically, once your website is infected, spammers will try to alter your SEO meta-data. They might stuff your pages with links and stuff and doing this, changing your title and description, helps them to rank, helps them to get more power. Basically, you don’t want people to be Googling for you and finding Viagra, you know? That’s not good. One of the problems with this, too, is even after you remove this spam, this is not automatically fixed. It takes time for Google to crawl your site. They’ve already crawled, found all the spam titles and descriptions and they’re showing them. Now you’re gonna want to make sure that Google comes back to your site really quickly and finds those proper titles and descriptions. Fortunately, this is really fast and easy to do in Search Console.
Under the Crawl section of Search Console you want to go to fetch us Google. I recommend entering your home page or any page on your site that has lots of links to other pages on your site. Click Fetch and then at the bottom there in the list you can click Submit to Index. This is gonna ask you if you want to crawl just this URL that you submitted or the URL and all its direct links on that page. If you’re hacked with a lot of spam, crawl all the direct links that you can and get it all done as much as possible quickly.
When you click the Submit to Index button, it’ll ask you here, first to confirm that you’re not a robot. Then you can just crawl just that URL. I think you get something like 500 of those a month or you can crawl that URL in its direct links and I think that one also has a limitation. It’s a lot less though. Crawl the URL in its direct links, click Go and pretty much, right away, Google Bot starts scanning your site. It’s a beautiful instantaneous change. For people who do SEO and change a lot of their titles and descriptions, this is a nice way to make sure that they get picked up really quick. That’s it for Search Console. Now, we’re gonna talk a little bit about how you can actually use Google Analytics to see if people are trying to attack your site. See if you’re getting hit with bots. See if there’s any injections that are showing up in your page URLs.
The first thing that I recommend you do is set up some email and mobile alerts for some things that are really important to you. For example, if you have a huge drop in revenue or your revenue drops to zero, you have eCommerce tracking set up and Google Analytics. That could indicate that your shopping cart was compromised. We actually wrote a post I think about a week ago. Denise wrote a post on our blog about a new type of fishing that is attacking eCommerce sites. Not only are the attackers basically taking over the cart page, but they’re also redirecting the customers to a fishing page.
This means that, not only are you getting your sales stolen, but they’re actually stealing your customers’ credit card data. That’s really bad. Really, really bad. If you see a drop in revenue, that could indicate that there’s something wrong. Especially if it drops right to zero. That’s unexpected right?
Another thing you might want to watch out for is 404 errors. That could also indicate that you, like I said, had a spam campaign or somebody’s adding and rooting lots of spam doorways on your site. We can also set an alert for that. I also recommend setting a page view spike alert. The reason for this is that, if you’re not familiar, users are one person coming into your site that has the cookie, the Google Analytics cookie, in their browser. That user can have multiple sessions and for every session, they may be viewing multiple pages on your site.
Users will always be smaller than sessions, which will always be smaller than page views, but if your page view suddenly spiked a ton, way above what users and sessions is growing, that could indicate that you’re getting hit with bots. What a bot will do is, it’s one user, one session. They’re scanning your whole site. Definitely something to keep track of. I keep track of these numbers on a weekly basis so that I can really see if page views are suddenly out of bounds. Then I can look in and find where they’re coming from. That’s definitely something you want to do.
How do you create these custom alerts? We’re in the Admin section here and we’re actually under the View column. It’s just way down at the bottom. You’ll see the Custom Alerts icon there. Once you’ve clicked that, you’ll be taken to a little editor where you can, basically, set whatever alert conditions you want. You can apply it to as many views as you like. You can set the period. I have it set to a day. You can have them send you an email at other email addresses for other people on your team and you can even set up mobile alerts if it’s something that’s really important to you.
For this one, because I’m looking at a page view spike alert, I’m gonna apply it to all traffic and I’m gonna have it alert me when the page views increases by more than 30% compared to the previous day. This usually happens to me every Monday because weekends are usually slow, but it reminds me to log in and take a look at things and make sure everything is good to go.
Look at one more alert here. This one’s for 404 errors so, in this case, everything’s the same except I’m going to apply it to the page title. If it contains the words “not found” which most of our 404 pages have that, but you might want to double check yours. If the page title was not found and it has page views greater than 500, then I’ll get an alert and I’ll know about it so I can take a look. It’s also just really good to solve your 404 errors in general. Alerts is a really powerful part of Google Analytics that I don’t think enough people average.
One more thing that you can look for, and I got some help from Antony on our vulnerability research team with this one, is Malicious Request Parameters that show up in Google Analytics. Your site probably uses legitimate crews already for things like search bars on your site. Like I mentioned before with the Google campaign builder, those UTM parameters that we add to our marketing campaign links. UTM source equals UTM content equals all that stuff. Those are all parameters that show up for the query.
As you can see in the second bullet here, this shows up after the main page pat, so example.com/page. Then there’s the question icon and then there’s the query and then parameter after that. Injections happen on your site when attackers try to escape this query parameter. As you can see in the example there, they’ll add the union insert malicious admin into your users. They’ll try to put all kinds of things in there and sometimes you might see unfamiliar or strange parameters with a couple page views at the bottom of your content reports. These could indicate attack attempts.
What you might want to do is pull up your Google Analytics, go to your behavior, go to the content, to all pages and just search for some potentially malicious commands. For exec injection, this would be things like Select, Delete, Exec, Union, that sort of thing. Pro site scripting, you might see things like on load, on mouse over alert. For local file injections, you’ll see the file colon with the two forward slashes there.
I’ll show you a couple examples of what this might look like. At the top there you can see on our lab.security.net site, somebody was trying to access the file, etcetera password. That’s a local file injection attempt there. Then we have a big long one here, which is obviously, not cool. You know, nobody wants to see that. There’s some weird unicode in there. If you look really closely, you’ll actually see www.fakereferrerdominator.com. There you go. That’s those fake scam bot referrers that I was talking about. You’ll also see that file operator for local file injection. At the bottom we see some cross site scripting attempt as well with the [inaudible 00:33:00] and alert and that kind of thing.
Some of these might be penetration testers that are actually white hat hackers trying to see if your site can be hacked, so they can let you know and get a bounty for it. It also is good to just take a look for these and be aware that they happen and just to know what they are. A lot of this, like I said, is generated by bots. It kind of dispels that myth that your site’s not being attacked. Every site is a target because it’s just automated. If your site is online, it will get hit by bots eventually. Trying to brute force it and that kind of stuff.
One more thing you can look for in Google Analytics is Common Vulnerable Spots. It depends on your website, but if you’re running a CMS, you probably have a login page. You can look for those secret areas of your site and see if people are trying to go there. Again, I recommend you have a filter to filter out your internal IP addresses so that when you and your team hit the website, that doesn’t get counted as a page view hit in Google Analytics. This way you can go ahead, go to your behavior reports under site content, all pages. In the search bar you can look for any page that should be hidden to visitors or that they shouldn’t be going to. If you have wordpress, for example, wpadmin/wplogin. See if people are hitting that page that shouldn’t be.
If you’re getting a lot of visitors to those login pages, it could indicate that you’re under a brute force attack. Also, malware can be specific. Malware campaigns will target specific locations on your website. Often on our blog, we’ll mention we have a new campaign out that you should look in your logs to find out if people are hitting certain areas of your site.
It is helpful if you want to get in with your IT team. Stay on top of websi te security news. Follow our blog and check out for those vulnerable locations to make sure that your reports are clean and you’re not getting compromised or that there’s no attempts.
We covered a lot of stuff here. Thank you so much for watching my webinar. We’re gonna take some questions now, but always feel free to tweet us at Sucuri Security using the #asksucuri if you have any questions about security. You can also find me on twitter at artdecotech. Now, I will pass it back to Kristen and see if she has any questions.
See Full Transcript
ExpandIn the website security community, our name is known for fast site hack cleanup and responsible vulnerability disclosure. As thought leaders in website security, we are committed to sharing what we know. Follow our concise and helpful website security guides and tutorials so you can learn how to clean and secure your website.
Join us on April 5th as we cover the latest findings from our 2022 Hacked Website Threat Report. We’ll shed light on some of the most common tactics and techniques we saw within compromised website environments.
All software has bugs – but some bugs can lead to serious security vulnerabilities that can impact your website and traffic. In this webinar, we dive into the steps you can take to migrate risk from infection and virtually patch known vulnerabilities in your website’s environment.
The threat landscape is constantly shifting. As attackers continue to hone their tools and exploit new vulnerabilities, our team works diligently to identify and analyze threats posed to webmasters. Join us on July 6th as we cover the latest findings from our Hacked Website Threat Report for 2021.
In this webinar we will highlight the various activity, access, and error logs WordPress site administrators have at their fingertips. Plus, learn how logs can best be used to manage, troubleshoot, and most importantly, secure your sites.
In our latest webinar, we'll describe action items that can improve the security state of internet-connected devices we all use every day. These devices will include common household staples such as: WiFi Routers, iOS/Android devices, and personal computers.
Join us as we delve into the minds of hackers to explain targeted attacks, random attack, and SEO attacks. Find out why bad actors target websites.
A feature benefit guide for our agencies and end users. Why use our firewall? What kind of protection does it offer? How does it affect the efficiency and speed of my site? Will it affect my server's resources? Find out the answers to these questions and more in our webinar…..
Cross-site contamination happens when one hacked site infects other sites on a shared server. This webinar is for beginners and web professionals to understand cross-site contamination and how to prevent it…..
If you're considering security for your site or are new to our services, this webinar will guide you through Sucuri's simple setup processes. Potential notifications, support options for various scenarios, and ways that you can also work to keep your site malware-free will be discussed…..
Learn how you or your agency can account for security with your client projects. Presented by Sucuri Co-Founder, Dre Armeda, this webinar shows how you can get involved and help clients who are not aware of some of the security risks involved with managing a website…..