More

OtmaneBenazzou · on June 2, 2023

SEEKING WORK | Fullstack Developer, automation, bots, mass scraping | REMOTE, France

I started teaching myself to code when I was 10yo, I've been specializing in automation and fullstack JS development since 2015.

I can basically automate or scrape anything you can think of on any device, I've worked as a fullstack & growth engineer for startups and scale ups.

I automated all kind of processes at most of my jobs, in my personal projects and in my personal life.

Worked on a few entrepreneurial projects(mostly MVPs that didn't take off :))

I work mostly with Typescript/Node/React(native), AWS Lambda, Firebase, Puppeteer/Playwright and my own private automation libraries.(also worked with a lot of other languages/technologies that aren't necessarily worth mentioning)

Please get in touch at: otmanebenazzou.pro [at] gmail (dot) com

Github: https://github.com/OB42 Linkedin: https://www.linkedin.com/in/otbedev/

OtmaneBenazzou · on June 2, 2023

Hi, the offer seems really interesting, but Breezy.hr really is awful to apply on though(it badly attempts to parse my resume)

OtmaneBenazzou · on June 1, 2023

Location: France

Remote: Full remote, open to travel a few days per year

Willing to relocate: Maybe?depends where, and I'd expect a significant pay bump

Technologies: Typescript, Node.js, React.js, React Native, Firebase/GCP, AWS Lambda, expert at scraping/automation/puppeteer/playwright/circumventing bot protections, MongoDB, (My)SQL, Golang, C, tiny bit of python/AI

Résumé/CV: Most of my background is on linkedin https://www.linkedin.com/in/otbedev/ Github: https://github.com/OB42/

Email: otmanebenazzou.pro [at] gmail [dot] com

OtmaneBenazzou · on March 28, 2023

the return button doesn't work on the site that's pretty annoying

WrtCdEvrydy · on March 28, 2023

I'm taking notes on this one but I'm guessing it's just not pushing a new page as you navigate.

OtmaneBenazzou · on March 28, 2023

Whisper is better at speech recognition than humans. Learn about the SOTA instead of mentioning bad mainstream products made years ago.

OtmaneBenazzou · on March 23, 2023

Are you going to cry because a company wants your data to be a bit more safe?

ipaddr · on March 23, 2023

Then you won't objective to 3fa or 20fa. More steps is safer right?

If your account is unimportant to you github shouldn't force you to add layers of security when they literally throw you under the bus in the TOS telling you it is your responsibility.. good let me decide my level of risk.

lrvick · on March 23, 2023

If a lot of people trust code that comes from your account, then it can and will be weaponized for a supply chain attack.

If you do not have the good sense to lock up such a weapon, then please delete your account.

devguy2 · on March 23, 2023

Keyword: if. What little i do distribute to a few end users come from local builds through a completely separate system. The security level applied reflects this more than well.

To my (well-founded) knowledge nobody distributes my code; and if they did they'd have full responsibility. That's what "THE SOFTWARE IS PROVIDED 'AS IS'" means. You don't have to like it and you don't have to use it.

There really is no middle ground unless you develop a relation. Who says i can be trusted? Not me!

ipaddr · on March 23, 2023

Not the case here.. and not the case for 99.99% of repos on github.

OtmaneBenazzou · on March 23, 2023

When you say in combination with AI, are you talking about teaching them ML fundamentals or high level ways of using AI?

I think both fields are going stay separated in most curriculums for a while

amichail · on March 23, 2023

I mean teach them how to program with an AI assistant.

OtmaneBenazzou · on March 23, 2023

You don't need an "AI expert" to learn to code with an AI assistant. Copilot and GPT are pretty straightforward, the next version of Copilot seems even better

amichail · on March 23, 2023

Shouldn't they learn about the limits of the AI assistant and the most effective ways of using it?

OtmaneBenazzou · on Dec 16, 2022

I tried the same thing. The performance is a LOT better using Parsec, it's free for personal use

jmakov · on Dec 16, 2022

Aren't only clients aupported for Parsec on Linux?

OtmaneBenazzou · on Dec 16, 2022

I'm not sure sorry, I was using my macbook from my linux, so maybe

OtmaneBenazzou · on Oct 3, 2022

SEEKING WORK | Fullstack Developer, automation, bots, mass scraping | REMOTE, France

I started teaching myself to code when I was 10yo, I've been specializing in automation and fullstack JS development since 2015.

I can basically automate or scrape anything you can think of on any device, I've worked as a fullstack & growth engineer for startups and scale ups.

I automated all kind of processes at most of my jobs, in my personal projects and in my personal life.

Worked on a few entrepreneurial projects(mostly MVPs that didn't take off :))

I work mostly with Typescript/Node/React(native), AWS Lambda, Firebase, Puppeteer/Playwright and my own private automation libraries.(also worked with a lot of other languages/technologies that aren't necessarily worth mentioning)

Please get in touch at: otmanebenazzou.pro [at] gmail (dot) com

Github: https://github.com/OB42 Linkedin: https://www.linkedin.com/in/otbedev/

OtmaneBenazzou · on Aug 23, 2022

Hi! It looks really REALLY cool!

Is there any kind of detection/stealthiness benchmark compared to libraries such as puppeteer-stealth or fakebrowser?

Honestly no matter how feature-complete and powerful a scraping tool is, the main "selling point" for me will always be stealthiness/human like behavior no matter how crappy the dev experience is.(and IMHO that's the same for most serious scrapers/bot makers)

Will it always be free or could it turn into a product/paid SaaS?(kind of like browserless) I'm kind of wondering if it's worth learning it if the next cool features are going to be for paying users only.

Is this something that you use internally or is it just a way to promote your paid products?

Thanks :)

franga2000 · on Aug 23, 2022

> for me will always be stealthiness/human like behavior no matter how crappy the dev experience is

Can't say I agree. The biggest value for me is being able to respond to site changes quickly. Having a key bot offline for an extended period of time can be costly, so being able to update, test and deploy it quickly is a big selling point. The vast majority of sites, including major companies, have very rudimentary bot detection, and a high-quality proxy provider is often all you need to bypass it.

As for the advanced methods like recaptcha 3 and cloudflare, I don't know of any framework that passes those out of the box anyways, so might as well use something that's easy to hack on and implement your own bypasses as necessary.

corford · on Aug 23, 2022

We do a lot of web scraping (hundreds of millions of requests, multiple terabytes of data per month) and have been using Crawlee - previously known as Apify SDK - since its v0.20 days. We adopted it for exactly this reason. It's extremely versatile and very pleasant to build on. The combination of Node, JS and Crawlee's modular SDK offers a sweet spot for scraping that imho is light years ahead of anything else.

Helps too that the apify devs themselves are nice and super responsive (we've had quite a few PRs merged over the last couple of years). The SDK code (and supporting libs like browser-tool, got-scraping) is clean and very easy to read/follow/extend (happy to hear too that the license is going to remain unchanged).

jonatron · on Aug 23, 2022

I'm not aware of a benchmark, but puppeteer-extra-plugin-stealth can be detected: https://datadome.co/bot-management-protection/detecting-head...

Crawlee does appear to do the basic checks though, like checking navigator.webdriver: https://github.com/apify/crawlee/blob/master/test/browser-po...

Last time I checked (over a year ago) I couldn't find any public code to make Chrome/Firefox properly undetectable.

That said, going to extreme lengths to be undetectable is rarely necessary, because some sites will serve up CAPTCHA's to real people on clean uncompromised residential connections anyway.

mnmkng · on Aug 23, 2022

Hey! Crawlee uses the libraries from our fingerprint suite internally. https://github.com/apify/fingerprint-suite#performance

It has an A rating in the BotD (fingerprint.js) detection. Now we're working on improving the CreepJS detection. That one is really tough though. Not even sure if anybody would use it in production environments as it must throw a lot of false positives.

It will always be free and maintained, because we're using it internally in all of our projects. We thought about adding a commercial license like Docker. Open source, but paid if you have more than $10mil revenue or more than 250 employees. But in the end we decided that we won't do even that and it's just free and always be free.

thekeyper · on Aug 23, 2022

Hi! Very cool project. Just out of curiosity, what trips up Crawlee on CreepJS? I haven't heard of anyone actually using it in production (actually don't think it's meant for production use). It's certainly overzealous in its aggregate "trust score", but (a) it seems like a good benchmark to aim for; (b) some of its sub-scores, like "stealth" and "like headless", might be helpful for Crawlee to evaluate, given the signals included in those analyses are fairly simple for people to throw together in their own custom (production) bot detection scripts and are somewhat ubiquitous.

mnmkng · on Aug 23, 2022

With fingerprints it's a tradeoff between having enough of them for large scale scraping and staying consistent with your environment. E.g. you can get exponentially more combinations if you also use Firefox, Webkit, MacOS and Windows user-agents (and prints) when you're running Chrome on Linux, but you also expose yourself to the better detection algorithms. If you stick to Linux Chrome only prints (which is what you usually run in VMs), you'll be less detectable, but might get rate limited.

B4nan · on Aug 23, 2022

Hi there!

We dont have any benchmarks for Crawlee just yet, but we are working on those as we speak. We care deeply about bot detection, one of the features of Crawlee is generated fingerprints based on real browser data we gather - you can read more about it in the https://github.com/apify/fingerprint-suite repository, which is used under the hood in Crawlee. For scraping via HTTP requests (e.g. cheerio/jsdom), we develop library called got-scraping (https://github.com/apify/got-scraping), that tries to mimic real browsers while doing fast HTTP requests.

Crawlee is and always will be open source. It originated from the Apify SDK (http://sdk.apify.com), which is a library to support development of so called Actors on the Apify Platform (http://apify.com) - so you can see it as a way for us to improve the experience of our customers. But you can use it anywhere you want, we provide ready to use Dockerfiles for each template.