Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

    • tripflag@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      and filtering malicious traffic is more important to me than you visiting my services, so I guess that makes us even :-)

        • El Barto@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          You had me until the “ethically sound position” part.

          You’re saying that Joe Blogger is acting unethically because he doesn’t allow VPN users to visit his site. C’mon, brother.

            • El Barto@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              edit-2
              2 months ago

              You’re putting words in my mouth. I didn’t say that. Targeting sounds like specifically doing it with an agenda.

              What you’re saying the equivalent of being offended that you can’t bring guns inside someone’s private property because they don’t want to, period. “It is not ethical that you forbid me from exercising my constitutional rights of bearing arms in your house. How dare you not allowing me to put my AK-47 in your kitchen counter!”

              Nope. I said that if someone doesn’t want to deal with VPN users because it’s more hassle than worth (e.g. bots), then so be it. Joe Blogger may get 20 visitors a month instead of 24. Oh the horror!

              I am a huge advocate of privacy laws. But if Joe Blogger doesn’t allow me in his personal website, eh. I might try archive.org.

              • Hold on a tick.

                Specifically blacklisting a group of users because of the technology they use is, by definition, “targeting”, right? I mean, if not, what qualifies as “targeting” for you?

                And, yeah. Posting a sign saying “No Nazi symbolism is allowed in this establishment” is - I would claim - targeting Nazis. Same as posting a sign, “no blacks allowed” - you’re saying that’s not targeting?

                I know we’re arguing definitions and have strayed from the original topic, but I think this is an important point to clarify, since you took specific objection to my use of it in that context; and because I’m being pedantic about it.

                • El Barto@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  2 months ago

                  Specifically blacklisting a group of users because of the technology they use is, by definition, “targeting”, right? I mean, if not, what qualifies as “targeting” for you?

                  You may be right. I guess it’s a matter of semantics. But the way you described it sounded more nefarious. “I’ll target this group of VPN users because fuck them, I hope they all die in a tsunami!!!” when it’s more like “ugh, another VPN bot. The 9th this hour and I’m hungry. You know what - I’ll just block VPN altogether and go fix me a sandwich.” Maybe that’s just my perception.

                  But anyway - it’s Joe Blogger’s machine, at his home, for him to do whatever he likes. Some rando from the street knocks on the door and says “excuse me, do you mind if I send an e-mail from your computer?” Joe Blogger can perfectly say no, not even an excuse is owed.

                  You’d have a point if it was a business or a corporation. Some home machine? Out of billions? Why bother?

                  I guess we’re two pedantic folks. I enjoy these discussions. I sometimes gain some new knowledge out of them.

  • F04118F@feddit.nl
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    Interesting approach but looks like this ultimately ends up:

    • being a lot of babysitting / manual work
    • blocking a lot of humans
    • not being robust against scrapers

    Anubis seems like a much better option, for those wanting to block bots without relying on Cloudflare:

    https://anubis.techaro.lol/

  • drkt@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    I have plenty of spare bandwidth and babysitting-resources so my approach is largely to waste their time. If they poke my honeypot they get poked back and have to escape a tarpit specifically designed to waste their bandwidth above all. It costs me nothing because of my circumstances but I know it costs them because their connections are metered. I also know it works because they largely stop crawling my domains I employ this on. I am essentially making my domains appear hostile.

    It does mean that my residential IP ends up on various blocklists but I’m just at a point in my life where I don’t give an unwiped asshole about it. I can’t access your site? I’m not going to your site, then. Fuck you. I’m not even gonna email you about the false-positive.

    It is also fun to keep a log of which IPs have poked the honeypot have open ports, and to automate a process of siphoning information out of those ports. Finding a lot of hacked NVR’s recently I think are part of some IoT botnet to scrape the internet.

    • melroy@kbin.melroy.org
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      I found a very large botnet in Brazil mainly and several other countries. And abuseipdb.com is not marking those IPs are a thread. We need a better solution.

      I think a honeypot is a good way. Another way is to use proof of work basically on the client side. Or we need a better place to share all stupid web scraping bot IPs.