Parsing a string into an array of arguments with non-strict trailing argument

javascript arrays string

79 просмотра

2 ответа

I'm trying to parse an argument string into an array of arguments. I have it mostly working, but it definitely seems like there'd be an easier way to go about doing this.

Rules:

  • Quoted strings ("some string") should be treated as a single argument, but the quotes should be removed from the resulting string
  • Any whitespace should separate arguments, except when we're already at the argCount (allowing the final argument to be unquoted, with all non-leading/trailing whitespace included)
  • Quotes should be ignored in the final argument, being left in the string as-is, unless the quotes in question are surrounding the entire final argument.

Examples:

  • this is an arg string with argCount 2 should result in ['this', 'is an arg string']
  • "this is" an arg string with argCount 2 should result in ['this is', 'an arg string']
  • "this is" "an arg" string too with argCount 3 should result in ['this is', 'an arg', 'string too']
  • this\nis an arg\n string! with argCount 3 should result in ['this', 'is', 'an arg\n string!']
  • this\nis an arg string! with argCount 2 should result in ['this', 'is an arg string!']
  • this\nis an arg string\nwith multiple lines in the final arg.\n inner whitespace still here with argCount 2 should result in ['this', 'is an arg string\nwith multiple lines in the final arg.\n inner whitespace still here']
  • this is an arg " string with "quotes in the final" argument. with argCount 2 should result in ['this', 'is an arg " string with "quotes in the final" argument.']
  • "this is" "an arg string with nested "quotes" in the final arg. neat." with argCount 2 should result in ['this is', 'an arg string with nested "quotes" in the final arg. neat.']

My current code:

function parseArgs(argString, argCount) {
    if(argCount) {
        if(argCount < 2) throw new RangeError('argCount must be at least 2.');
        const args = [];
        const newlinesReplaced = argString.trim().replace(/\n/g, '{!~NL~!}');
        const argv = stringArgv(newlinesReplaced);
        if(argv.length > 0) {
            for(let i = 0; i < argCount - 1; i++) args.push(argv.shift());
            if(argv.length > 0) args.push(argv.join(' ').replace(/{!~NL~!}/g, '\n').replace(/\n{3,}/g, '\n\n'));
        }
        return args;
    } else {
        return stringArgv(argString);
    }
}

I'm using the string-argv library, which is what stringArgv is calling. The four last examples do not work properly with my code, as the dummy newline replacement tokens cause the arguments to be smashed together during the stringArgv call - and quotes are taking complete priority.

Update:

I clarified the quotes rule, and added a rule about quotes also being left untouched in the final argument. Added two additional examples to go along with the new rule.

Автор: Gawdl3y Источник Размещён: 08.11.2019 10:54

Ответы (2)


2 плюса

Решение

You could use a regular expression for this:

function mySplit(s, argCount) {
    var re = /\s*(?:("|')([^]*?)\1|(\S+))\s*/g,
        result = [],
        match = []; // should be non-null
    argCount = argCount || s.length; // default: large enough to get all items
    // get match and push the capture group that is not null to the result
    while (--argCount && (match = re.exec(s))) result.push(match[2] || match[3]);
    // if text remains, push it to the array as it is, except for 
    // wrapping quotes, which are removed from it
    if (match && re.lastIndex < s.length)
        result.push(s.substr(re.lastIndex).replace(/^("|')([^]*)\1$/g, '$2'));
    return result;
}
// Sample input
var s = '"this is" "an arg" string too';
// Split it
var parts = mySplit(s, 3);
// Show result
console.log(parts);

This gives the desired result for all example cases you provided.

Backslash escaping

If you want to support backslash escaping, so you can embed literal quotes in your first arguments without interrupting those arguments, then you can use this regular expression in the above code:

var re = /\s*(?:("|')((?:\\[^]|[^\\])*?)\1|(\S+))\s*/g,

The magic is in (?:\\[^]|[^\\]): either a backslash followed by something, or not-a-backslash. This way, the quote that follows a backslash will never get matched as an argument-closing one.

The (?: makes the group non capturing (i.e. it is not numbered for $1 style back-references).

The [^] may look weird, but it is a way in JavaScript regexes to say "any character", which is more broad than the dot, which does not match newlines. There is the s modifier out there to give the dot operator this broader meaning, but that modifier is not supported in JavaScript.

Автор: trincot Размещён: 20.08.2016 10:38

2 плюса

I haven't had the chance to test thoroughly but the following code probably solves your question.

function handleQuotedString(m,sm){
  return sm.trim().indexOf(" ") === -1 ? sm : '"' + sm.trim() + '"';
}
function getArguments(s,n){
  return s.trim()                       // get rid of any preceding and trailing whitespaces
          .replace(/\n/g, " \n ")       // make word\nword => word \n word
          .replace(/"([\S\s]+?)"/g,handleQuotedString)
          .split(" ")                   // get words into array
          .reduce((p,w) => w[0] === '"' ||
                           w[0] === "'" ? (p[0] = true, p.concat(w.slice(1)))
                                        : w[w.length-1] === '"' ||
                                          w[w.length-1] === "'" ? (p[0] = false, p[p.length-1]+= " " + w.slice(0,w.length-1), p)
                                                                : p[0] ? (w !== "\n" && (p[p.length-1]+= " " + w.slice(1,w.length-1)), p)
                                                                       : p.concat(w), [false])
          .slice(1)
          .reduce((args,arg) => args[0] ? arg !== "\n" &&
                                          arg !== ""   ? (args[0]--,args.concat(arg))
                                                       : args
                                        : (args[args.length-1]+= " " + arg || " ",args),[n])
          .slice(1);
}
var s = 'hi there\nas "you" see "   this\nis  " "an arg" string\n     too';
console.log(getArguments(s,7));

The first reduce inclusively merges words starting with a quote up until it meets another word ending with a quote.

The second reduce sets up arguments according to the given count and other conditions.

Of course there might be tons of special characters in the fed string those need to be eliminated. This can be done with an initial filtration stage.

Автор: Redu Размещён: 20.08.2016 09:39
Вопросы из категории :
32x32